UD for Uyghur 
Tokenization and Word Segmentation
- In general, words are delimited by white spaces or punctuation. Punctuation may appear in some abbreviations or numeric expressions.
Morphology
Tags
- Uyghur uses all 17 universal POS categories, including particles (PART). Nevertheless, there are no examples of the SYM category in the current data.
- Uyghur has a limited number of auxiliary verbs (AUX), but in most cases, a combination of
a semantically weak verb with a non-finite form of a semantically prominent verb is analyzed
as an xcomp construction, not as auxiliary. The following verbs are currently treated as
auxiliaries at least in some contexts:
- The copula بول bol (“to be”) is used with non-verbal predicates wherever the suffixed copula cannot be used. It is also used in some periphrastic constructions with other verbs.
- The suffixed copula ئى i can provide only some forms.
- كەت ket (“to go”) is used in periphrastic perfective constructions.
- كەل kel (“to come”) is used in irrealis mood.
- قال qal (“to stay”) is used in periphrastic progressive constructions.
- تۇر tur (“to stand”) is used in periphrastic progressive constructions.
- Verbs with modal meaning are not considered auxiliary in Uyghur.
- There are five main (de)verbal forms, distinguished by the UPOS tag and the value of the VerbForm feature:
Nominal Features
- There is no grammatically relevant gender in the Turkic languages. However, the Gender feature
is sporadically used with personal proper nouns (PROPN) and distinguishes two values:
MascandFem. - The two main values of the Number feature are
SingandPlur. For nouns (and adjectives), onlyPluris used when the plural suffix is present; singular is left unannotated. For pronouns (PRON), both singular and plural are explicitly annotated where relevant. - Case has 6 possible values:
Nom,Acc,Gen,Dat,Loc,Abl. It occurs with the nominal words, i.e., NOUN, PROPN, PRON, ADJ, DET, NUM. It also occurs with some non-finite forms of VERB and AUX. - Possession is marked morphologically on the possessed NOUN, which cross-references the person and
number of the possessor, using the layered features
Number[psor]andPerson[psor].
Verbal Features
- Finite verbs always have one of four values of Mood:
Ind,Imp,IntorCnd. The interrogative mood (Int) is a language-specific value and applies to verbs with the interrogative suffix -mu. - Verbs in the indicative mood always have one of two values of Tense:
PastorPres. - There are also two values of Aspect:
HabandPerf. - As for Voice, only the passive forms (
Pass) are explicitly annotated;Voice=Actis not used.
Pronouns, Determiners, Quantifiers
- PronType is used with pronouns (PRON), determiners (DET) and adverbs (ADV).
Currently only the values
Prs,DemandIntare used. - NumType is used with numerals (NUM), determiners (DET), and adjectives (ADJ).
- The Reflex feature marks reflexive pronouns ئۆز (öz).
- Person is a lexical feature of personal pronouns (PRON) and has three values,
1,2and3. Person is not marked on other types of pronouns and on nouns, although they can be almost always interpreted as the 3rd person.
Syntax
Core Arguments, Oblique Arguments and Adjuncts
- The dominant word order in Uyghur is subject-object-verb, although other word orders are possible, too.
- Nominal subject (nsubj) is a noun phrase in the nominative case, without preposition.
- A subordinate clause may serve as the subject and is labeled csubj.
- Nominal object (obj) is a noun phrase in the accusative or nominative case, without preposition.
- A subordinate clause may serve as the object and is labeled ccomp.
Relations Overview
- The following relation subtypes are used in Uyghur:
- advcl:cond for conditional adverbial clauses
- advmod:emph for adverbs or particles that modify noun phrases and emphasize or negate them
- compound:lvc for light-verb constructions
- compound:redup for reduplicated compounds
- nmod:cau for the causee of a causative predicate (this should be
obl:cauin future releases) - nmod:comp for comparative modifiers of adjectives or adverbs (this should be
obl:compin future releases) - nmod:part for part-whole relations
- nmod:poss for possessive and genitive relations
- nmod:tmod for temporal modifiers and relations within temporal relations such as dates
- obl:tmod for temporal modifiers
- The following relation types are not used in Uyghur at all: iobj, expl, clf
Treebanks
There is 1 Uyghur UD treebank: