UD for Uyghur
Tokenization and Word Segmentation
- In general, words are delimited by white spaces or punctuation. Punctuation may appear in some abbreviations or numeric expressions.
Morphology
Tags
- Uyghur uses all 17 universal POS categories, including particles (PART). Nevertheless, there are no examples of the SYM category in the current data.
- Uyghur has a limited number of auxiliary verbs (AUX), but in most cases, a combination of
a semantically weak verb with a non-finite form of a semantically prominent verb is analyzed
as an xcomp construction, not as auxiliary. The following verbs are currently treated as
auxiliaries at least in some contexts:
- The copula بول bol (“to be”) is used with non-verbal predicates wherever the suffixed copula cannot be used. It is also used in some periphrastic constructions with other verbs.
- The suffixed copula ئى i can provide only some forms.
- كەت ket (“to go”) is used in periphrastic perfective constructions.
- كەل kel (“to come”) is used in irrealis mood.
- قال qal (“to stay”) is used in periphrastic progressive constructions.
- تۇر tur (“to stand”) is used in periphrastic progressive constructions.
- Verbs with modal meaning are not considered auxiliary in Uyghur.
- There are five main (de)verbal forms, distinguished by the UPOS tag and the value of the VerbForm feature:
Nominal Features
- There is no grammatically relevant gender in the Turkic languages. However, the Gender feature
is sporadically used with personal proper nouns (PROPN) and distinguishes two values:
Masc
andFem
. - The two main values of the Number feature are
Sing
andPlur
. For nouns (and adjectives), onlyPlur
is used when the plural suffix is present; singular is left unannotated. For pronouns (PRON), both singular and plural are explicitly annotated where relevant. - Case has 6 possible values:
Nom
,Acc
,Gen
,Dat
,Loc
,Abl
. It occurs with the nominal words, i.e., NOUN, PROPN, PRON, ADJ, DET, NUM. It also occurs with some non-finite forms of VERB and AUX. - Possession is marked morphologically on the possessed NOUN, which cross-references the person and
number of the possessor, using the layered features
Number[psor]
andPerson[psor]
.
Verbal Features
- Finite verbs always have one of four values of Mood:
Ind
,Imp
,Int
orCnd
. The interrogative mood (Int
) is a language-specific value and applies to verbs with the interrogative suffix -mu. - Verbs in the indicative mood always have one of two values of Tense:
Past
orPres
. - There are also two values of Aspect:
Hab
andPerf
. - As for Voice, only the passive forms (
Pass
) are explicitly annotated;Voice=Act
is not used.
Pronouns, Determiners, Quantifiers
- PronType is used with pronouns (PRON), determiners (DET) and adverbs (ADV).
Currently only the values
Prs
,Dem
andInt
are used. - NumType is used with numerals (NUM), determiners (DET), and adjectives (ADJ).
- The Reflex feature marks reflexive pronouns ئۆز (öz).
- Person is a lexical feature of personal pronouns (PRON) and has three values,
1
,2
and3
. Person is not marked on other types of pronouns and on nouns, although they can be almost always interpreted as the 3rd person.
Syntax
Core Arguments, Oblique Arguments and Adjuncts
- The dominant word order in Uyghur is subject-object-verb, although other word orders are possible, too.
- Nominal subject (nsubj) is a noun phrase in the nominative case, without preposition.
- A subordinate clause may serve as the subject and is labeled csubj.
- Nominal object (obj) is a noun phrase in the accusative or nominative case, without preposition.
- A subordinate clause may serve as the object and is labeled ccomp.
Relations Overview
- The following relation subtypes are used in Uyghur:
- advcl:cond for conditional adverbial clauses
- advmod:emph for adverbs or particles that modify noun phrases and emphasize or negate them
- compound:lvc for light-verb constructions
- compound:redup for reduplicated compounds
- nmod:cau for the causee of a causative predicate (this should be
obl:cau
in future releases) - nmod:comp for comparative modifiers of adjectives or adverbs (this should be
obl:comp
in future releases) - nmod:part for part-whole relations
- nmod:poss for possessive and genitive relations
- nmod:tmod for temporal modifiers and relations within temporal relations such as dates
- obl:tmod for temporal modifiers
- The following relation types are not used in Uyghur at all: iobj, expl, clf
Treebanks
There is 1 Uyghur UD treebank: