UD Central Kurdish Mukri
Language: Central Kurdish (code: ckb)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.17 release.
The following people have contributed to making this treebank part of UD: Hiwa Asadpour, Luigi Talamo, Annemarie Verkerk.
Repository: UD_Central_Kurdish-Mukri
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Central Kurdish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [asadpourhiwa (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
This treebank contains manually annotated data for Mukri Kurdish (Indo-European) belonging to Kurdish language family, following the Universal Dependencies (UD) guidelines. It aims to offer a syntactically and morphologically consistent dataset that helps with Kurdish language processing and cross-linguistic studies. The current release includes texts in Kurdish Roman Alphabet script and provides dependency annotation at the word, phrase, and sentence levels.
The data is stored in .conllu format, which is standard in UD. Each sentence is tokenized, lemmatized, POS-tagged, and labeled with syntactic dependencies.
A typical entry looks like this:
1 Lew l+ew ADP _ _ 2 case _ _ 2 mal mal NOUN _ Number=Sing|Definite=Def 0 root _ _
Acknowledgments
- Lead Maintainer: Hiwa Asadpour — overall coordination, main annotation, and releases
- Contributors: Luigi Talamo; Annemarie Verkerk
- Email: [asadpourhiwa@gmail.com]
- License: [CC-BY 4.0]
You are free to share and adapt this work with proper attribution.
References
If you use this dataset, please cite the work as:
@misc{centralkurdishud2025, title = {Central Kurdish-Mukri Universal Dependencies Treebank}, author = {Asadpour, Hiwa}, year = {2025}, howpublished = {\url{https://universaldependencies.org/treebanks/ckb_mukri/}}, note = {Universal Dependencies v2} }
-
2022a Asadpour, Hiwa. Parts of Speech and the placement of Targets in the corpus of languages in northwestern Iran. Corpus Linguistics and Linguistic Theory. De Gruyter Mouton.
-
2022b Asadpour, Hiwa. Word order in Mukri Kurdish – the case of incorporated Targets. In Hiwa Asadpour and Thomas Jügel (eds.), Word Order Variation: Semitic, Turkic, and Indo-European Languages in contact, Studia Typologica [STTYP] 31. 63-88. Berlin & Boston: De Gruyter Mouton.
-
2022c Asadpour, Hiwa, Shene Othoman and Manfred Sailer. Non-“wh” relatives in English and Kurdish: Constraints on grammar and use. HPSG 2022 (29th International Conference on Head-Driven Phrase Structure Grammar). Online Event, JP.
-
2021 Asadpour, Hiwa. Cross-Dialect Diversity in Mukri Kurdish I: Phonological and Phonetic variation at Linguistic Geography, Cambridge University Press.
-
2016 Öpengin, Ergin. The Mukri Variety of Central Kurdish: Grammar, Texts, and Lexicon. Beiträge zur Iranistik 40. Wiesbaden: Reichert
Statistics of UD Central Kurdish Mukri
POS Tags
ADJ – ADP – ADV – AUX – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – VERB
Features
Aspect – Definite – ExtPos – Mood – Number – Person – Reflex – Tense – VerbForm – Voice
Relations
acl:relcl – advmod – aux – case – cc – compound:lvc – conj – cop – det – discourse – dislocated – iobj – nmod – nmod:poss – nsubj – nummod – obj – obl – parataxis – punct – root – vocative
Tokenization and Word Segmentation
- This corpus contains 138 sentences, 660 tokens and 765 syntactic words.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
- This corpus contains 105 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 63 types of multi-word tokens. Examples: kutī, lew, bew, norey, temāšāy, Kēy, biřēkī, cake, dīyāre, gīrīān, hersēkyān, kīhāyān, nīgābānīyekey, te, wextī, xewī, xoy, yēkmān, řuħī, Būne, Duruste, Hēnāy, Lēyān, Nejāřēkīšī, Wānēkēkyān, Xewim, Xeyātēkīšī, Yēkyān, bergī, berī, bewey, bote, durustī, dīkeyān, dūmān, ewey, fikrī, fāydeyī, gājūtī, hemāne, jānyān, kesī, kolkedārēke, kolkedārēkī, melāye, meseleyēkit, neferīn, pēyān, pēšīnīyāne, tekbīrey.
Morphology
Tags
- This corpus uses 13 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, VERB
- This corpus does not use the following tags: SCONJ, CCONJ, SYM, X
- This corpus contains 15 word types tagged as particles (PART): Bełē, Dā, Wiłłāhī, Wānēkēk, Yān, bā, ke, kāk, ne, w, xo, yānī, Āyā, ā, ū
- This corpus contains 19 lemmas tagged as pronouns (PRON): Bo, Engo, Ew, Ewāne, Kē, eme, emin, eto, ewān, im, it, kes, kīhā, mān, t, to, y, yān, ī
- This corpus contains 5 lemmas tagged as determiners (DET): ew, ewe, ewey, ho, Āwā
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): bûn
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: bûn
- There are 1 (de)verbal forms:
- Fin
- AUX: e, te, ye, yā, čīye
- VERB: kut, bū, e, kird, Deydey, bē, bikeyn, hāt, būn, dekem
Nominal Features
- Plur
- NOUN: pēšīnīyān, sāʕetān, bergāney, pārčāne
- PRON: yān, mān, Eme, Engo, Ewāne, ewān
- VERB-Fin: bikeyn, būn, Dełēn, Hestān, Hātin, bikēšīn, binūn, binūīn, bičīn, denēyn
- Sing
- ADV: qāčāxčīyātīye, kinēk
- AUX-Fin: e, te, ye, yā, čīye
- DET: ew, ewe, Ewī, ewey, eweydā, Āwā
- NOUN: nejāř, xew, ře, Nore, čāw, gīrī, kārī, nefer, wextī, xeter
- NUM: yēk, hersēk, čwār, yek
- PART: Wānēkēk
- PRON: ī, y, emin, eto, Kē, Ewīš, Eminīš, eme, im, t
- PROPN: Süleymān, Xeyāt, gājūt, miħemedī, qurʕān, xudā, xudāy, xwā, xudāyey
- VERB: kut, bū, kird, e, Deydey, bē, hāt, dekem, dezānī, dečī
- VERB-Fin: kut, bū, e, kird, Deydey, bē, hāt, dekem, dezānī, dečī
- Def
- ADJ: spīyāe
- DET: ew, ewe, Ewī, ewey, Āwā
- NOUN: kuře, Nejāřekeš, Xeyāteke, berāte, berātekey, huqūqekey, kulkedāre, meseleyī, nejāřey, tekbīre
- Ind
- ADV: kinēk
- NOUN: dizēk, kolkedārēk, Nejāřēkīš, Spīyāyēk, Xeyātēkīš, dēyēkī, köredēyekī, melāyēk, meseleyēk, māłēkī
- NUM: yēk, hersēk, yek
- PART: Wānēkēk
- Spec
- ADV: pēšē
- DET: eweydā
- NOUN: bergey, bergāney, dārey, dēy, kesī, köredē, köredēyey, melāyī, nejāřī, pārčāney
- PROPN: xudāy, xudāyey
Degree and Polarity
Verbal Features
- Imp
- VERB-Fin: dekirdewe
- Imp
- VERB-Fin: heste
- Ind
- VERB-Fin: Deydey, bū, dekem, dezānī, dečī, dełē, dāyā, nebū, nedečū, Dełēn
- Sub
- VERB-Fin: bē, bikeyn, bigēřimewe, bikem, bikā, bikēšīn, binūn, binūīn, bičīn, biškē
- Past
- VERB-Fin: kut, bū, kird, hāt, būn, hestā, kewit, kirdūwe, nebū, nedečū
- Pres
- AUX-Fin: e, te, ye, yā, čīye
- VERB-Fin: e, Deydey, bē, bikeyn, dekem, dezānī, dečī, dełē, dāyā, heste
- Act
- AUX-Fin: e, te, ye, yā, čīye
- VERB-Fin: kut, bū, e, kird, Deydey, bē, bikeyn, hāt, būn, dekem
Pronouns, Determiners, Quantifiers
- Yes
- PART: xo
- 1
- PRON: emin, mān, eme, Eminīš, im
- VERB-Fin: bikeyn, dekem, bigēřimewe, bikem, bikēšīn, binūīn, bičīn, dekēšim, denēyn, depirsim
- 2
- PRON: eto, t, Engo, etož, it, to
- VERB-Fin: Deydey, dezānī, dečī, heste, y, binūn, bī, hestēne, nādeyewe, ī
- 3
- AUX-Fin: e, te, ye, yā, čīye
- NOUN: melāyēk, pārčāne
- PRON: ī, y, yān, Ewīš, Bo, Ew, Ewāne, ewān, kes, kīhā
- VERB: kut, bū, kird, e, bē, hāt, būn, dełē, dāyā, hestā
- VERB-Fin: kut, bū, e, kird, bē, hāt, būn, dełē, dāyā, hestā
Other Features
- ExtPos
- ADP
- ADP: be, lē, le, l, we, b, bo, de, pē, d
- ADP
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: bûn.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: bûn.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (17)
- VERB-Fin--PRON (24)
- obj
- VERB-Fin--NOUN (30)
- iobj
- VERB-Fin--NOUN (1)
- VERB-Fin--NOUN-ADP(be) (4)
- VERB-Fin--NOUN-ADP(e) (1)
- VERB-Fin--NOUN-ADP(le) (1)
- VERB-Fin--PRON (4)
- VERB-Fin--PRON-ADP(Le) (1)
- VERB-Fin--PRON-ADP(be) (2)
- VERB-Fin--PRON-ADP(bo) (1)
- VERB-Fin--PRON-ADP(lē) (5)
- VERB-Fin--PRON-ADP(pē) (2)
Verbs with Reflexive Core Objects
- This corpus contains 1 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: kirdin xo
Relations Overview
- This corpus uses 3 relation subtypes: acl:relcl, compound:lvc, nmod:poss
- The following 2 main types are not used alone, they are always subtyped: acl, compound
- The following 16 relation types are not used in this corpus at all: csubj, ccomp, xcomp, expl, advcl, mark, appos, amod, clf, fixed, flat, list, orphan, goeswith, reparandum, dep