home edit page issue tracker

This page pertains to UD version 2.

UD for Old Turkish

UD Old Turkish is an effort to digitize and annotate (or annotate from existing digitization) existing, or structurally constructed to be coherent and fit, Old Turkic script texts. Having all corpus in Old Turkic script is a precondition for this language. This document intends to be rough than precise because the approach of annotation can change drastically over time.

Tokenization and Word Segmentation



Current corpora make use of only 13 tags. The plan is to use 16 except for X.




There is one Old Turkish UD treebank: