UD for Kurmanji
Tokenization and Word Segmentation
- In general, words are delimited by whitespace characters. Description of exceptions follows.
- According to typographical rules, many punctuation marks are attached to a neighboring word. We tokenize them as separate tokens (words), except the following cases:
- The period marking an abbreviation: Dr. “doctor” is one token.
- The apostrophe (or occasionally a hyphen) is not treated as punctuation when it occurs between a number and its morphological suffix, as in 15’ê, 1932’an.
- There is a small class of words that may contain spaces in writing.
- There are several closed classes of contractions that are treated as multi-word tokens and segmented to individual syntactic words. The most prominent type is a pronoun fused with the future auxiliary: ezê = ez + dê “I will”.
Morphology
Tags
- Kurmanji uses all 17 universal POS categories, including particles (PART). Only 2 word types are tagged PART: jî “also”, ma.
- Kurmanji has four auxiliaries; three of them inflect like verbs (and can act as full verbs depending on context), while dê is an uninflected particle:
- The copula bûn “to be”.
- The future tense marker dê.
- The passive auxiliary hatin “to come” (it combines with an infinitive of the lexical verb).
- The causative auxiliary dan “to give” (it combines with an infinitive of the lexical verb).
- Verbs with modal meaning are not considered auxiliary in Kurmanji.
- There are four main (de)verbal forms, distinguished by the UPOS tag and the value of the VerbForm feature:
Nominal Features
- Nominal words (NOUN, PROPN) have an inherent Gender feature with one of two values:
Masc
orFem
. The gender of the referent is reflected by PRON and DET. - The two values of the Number feature are
Sing
andPlur
. The following parts of speech inflect for number: NOUN, PROPN, PRON, DET, VERB, AUX, marginally NUM. - Case has 4 possible values:
Nom
,Acc
,Con
,Voc
. It occurs with the nominal words, i.e., NOUN, PROPN, PRON, ADJ, DET, NUM.
Degree and Polarity
- Degree applies to adjectives (ADJ) and has one of three possible values:
Pos
,Cmp
,Sup
. For example, zêde “a lot of”, zêdetir “more”, zêdetirîn “most”. - Polarity has one value,
Neg
(whilePos
is not marked explicitly), and applies primarily to verbs (VERB, AUX), determiners (DET) and adverbs (ADV).
Verbal Features
- Aspect is
Perf
(perfective) andProg
(progressive); it can be also unmarked. - Finite verbs always have one of four values of Mood:
Ind
,Imp
,Opt
orSub
. - Verbs in the indicative mood always have one of four values of Tense:
Pqp
,Past
,Pres
orFut
. - Evident (evidentiality) has only one value,
Nfh
(non-first-hand).
Pronouns, Determiners, Quantifiers
- PronType is used with pronouns (PRON), determiners (DET) and adverbs (ADV).
- NumType is used with numerals (NUM – only
Card
). - The Reflex feature marks reflexive pronouns (xwe).
- Person is a lexical feature of personal pronouns (PRON) and has three values,
1
,2
and3
. Person is not marked on other types of pronouns and on nouns, although they can be almost always interpreted as the 3rd person.
Syntax
Core Arguments, Oblique Arguments and Adjuncts
- Nominal subject (nsubj) is a noun phrase in the nominative case, without preposition.
- Objects may be bare noun phrases in accusative (oblique).
Non-verbal Clauses
- The copula verb bûn (be) is used in equational, attributional, locative, possessive and benefactory nonverbal clauses.
Treebanks
There is 1 Kurmanji UD treebank: