home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD Central Kurdish Mukri

Language: Central Kurdish (code: ckb)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.17 release.

The following people have contributed to making this treebank part of UD: Hiwa Asadpour, Luigi Talamo, Annemarie Verkerk.

Repository: UD_Central_Kurdish-Mukri
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18

License: CC BY-SA 4.0

Genre: grammar-examples

Questions, comments? General annotation questions (either Central Kurdish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [asadpourhiwa (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation	Source
Lemmas	annotated manually
UPOS	annotated manually, natively in UD style
XPOS	not available
Features	annotated manually, natively in UD style
Relations	annotated manually, natively in UD style

Description

This treebank contains manually annotated data for Mukri Kurdish (Indo-European) belonging to Kurdish language family, following the Universal Dependencies (UD) guidelines. It aims to offer a syntactically and morphologically consistent dataset that helps with Kurdish language processing and cross-linguistic studies. The current release includes texts in Kurdish Roman Alphabet script and provides dependency annotation at the word, phrase, and sentence levels.

The data is stored in .conllu format, which is standard in UD. Each sentence is tokenized, lemmatized, POS-tagged, and labeled with syntactic dependencies.

A typical entry looks like this:

1 Lew l+ew ADP _ _ 2 case _ _ 2 mal mal NOUN _ Number=Sing|Definite=Def 0 root _ _

Acknowledgments

Lead Maintainer: Hiwa Asadpour — overall coordination, main annotation, and releases
Contributors: Luigi Talamo; Annemarie Verkerk
Email: [asadpourhiwa@gmail.com]
License: [CC-BY 4.0]

You are free to share and adapt this work with proper attribution.

References

If you use this dataset, please cite the work as:

@misc{centralkurdishud2025, title = {Central Kurdish-Mukri Universal Dependencies Treebank}, author = {Asadpour, Hiwa}, year = {2025}, howpublished = {\url{https://universaldependencies.org/treebanks/ckb_mukri/}}, note = {Universal Dependencies v2} }

2022a Asadpour, Hiwa. Parts of Speech and the placement of Targets in the corpus of languages in northwestern Iran. Corpus Linguistics and Linguistic Theory. De Gruyter Mouton.
2022b Asadpour, Hiwa. Word order in Mukri Kurdish – the case of incorporated Targets. In Hiwa Asadpour and Thomas Jügel (eds.), Word Order Variation: Semitic, Turkic, and Indo-European Languages in contact, Studia Typologica [STTYP] 31. 63-88. Berlin & Boston: De Gruyter Mouton.
2022c Asadpour, Hiwa, Shene Othoman and Manfred Sailer. Non-“wh” relatives in English and Kurdish: Constraints on grammar and use. HPSG 2022 (29th International Conference on Head-Driven Phrase Structure Grammar). Online Event, JP.
2021 Asadpour, Hiwa. Cross-Dialect Diversity in Mukri Kurdish I: Phonological and Phonetic variation at Linguistic Geography, Cambridge University Press.
2016 Öpengin, Ergin. The Mukri Variety of Central Kurdish: Grammar, Texts, and Lexicon. Beiträge zur Iranistik 40. Wiesbaden: Reichert

Statistics of UD Central Kurdish Mukri

POS Tags

ADJ – ADP – ADV – AUX – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – VERB

Features

Aspect – Definite – ExtPos – Mood – Number – Person – Reflex – Tense – VerbForm – Voice

Relations

acl:relcl – advmod – aux – case – cc – compound:lvc – conj – cop – det – discourse – dislocated – iobj – nmod – nmod:poss – nsubj – nummod – obj – obl – parataxis – punct – root – vocative

Tokenization and Word Segmentation

This corpus contains 138 sentences, 660 tokens and 765 syntactic words.

All tokens in this corpus are followed by a space.

This corpus does not contain words with spaces.

This corpus does not contain words that contain both letters and punctuation.

This corpus contains 105 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
There are 63 types of multi-word tokens. Examples: kutī, lew, bew, norey, temāšāy, Kēy, biřēkī, cake, dīyāre, gīrīān, hersēkyān, kīhāyān, nīgābānīyekey, te, wextī, xewī, xoy, yēkmān, řuħī, Būne, Duruste, Hēnāy, Lēyān, Nejāřēkīšī, Wānēkēkyān, Xewim, Xeyātēkīšī, Yēkyān, bergī, berī, bewey, bote, durustī, dīkeyān, dūmān, ewey, fikrī, fāydeyī, gājūtī, hemāne, jānyān, kesī, kolkedārēke, kolkedārēkī, melāye, meseleyēkit, neferīn, pēyān, pēšīnīyāne, tekbīrey.

Morphology

Nominal Features

Number

Plur
- NOUN: pēšīnīyān, sāʕetān, bergāney, pārčāne
- PRON: yān, mān, Eme, Engo, Ewāne, ewān
- VERB-Fin: bikeyn, būn, Dełēn, Hestān, Hātin, bikēšīn, binūn, binūīn, bičīn, denēyn

Sing
- ADV: qāčāxčīyātīye, kinēk
- AUX-Fin: e, te, ye, yā, čīye
- DET: ew, ewe, Ewī, ewey, eweydā, Āwā
- NOUN: nejāř, xew, ře, Nore, čāw, gīrī, kārī, nefer, wextī, xeter
- NUM: yēk, hersēk, čwār, yek
- PART: Wānēkēk
- PRON: ī, y, emin, eto, Kē, Ewīš, Eminīš, eme, im, t
- PROPN: Süleymān, Xeyāt, gājūt, miħemedī, qurʕān, xudā, xudāy, xwā, xudāyey
- VERB: kut, bū, kird, e, Deydey, bē, hāt, dekem, dezānī, dečī
- VERB-Fin: kut, bū, e, kird, Deydey, bē, hāt, dekem, dezānī, dečī

Definite

Def
- ADJ: spīyāe
- DET: ew, ewe, Ewī, ewey, Āwā
- NOUN: kuře, Nejāřekeš, Xeyāteke, berāte, berātekey, huqūqekey, kulkedāre, meseleyī, nejāřey, tekbīre

Ind
- ADV: kinēk
- NOUN: dizēk, kolkedārēk, Nejāřēkīš, Spīyāyēk, Xeyātēkīš, dēyēkī, köredēyekī, melāyēk, meseleyēk, māłēkī
- NUM: yēk, hersēk, yek
- PART: Wānēkēk

Spec
- ADV: pēšē
- DET: eweydā
- NOUN: bergey, bergāney, dārey, dēy, kesī, köredē, köredēyey, melāyī, nejāřī, pārčāney
- PROPN: xudāy, xudāyey

Degree and Polarity

Verbal Features

Aspect

Imp
- VERB-Fin: dekirdewe

Mood

Imp
- VERB-Fin: heste

Ind
- VERB-Fin: Deydey, bū, dekem, dezānī, dečī, dełē, dāyā, nebū, nedečū, Dełēn

Sub
- VERB-Fin: bē, bikeyn, bigēřimewe, bikem, bikā, bikēšīn, binūn, binūīn, bičīn, biškē

Tense

Past
- VERB-Fin: kut, bū, kird, hāt, būn, hestā, kewit, kirdūwe, nebū, nedečū

Pres
- AUX-Fin: e, te, ye, yā, čīye
- VERB-Fin: e, Deydey, bē, bikeyn, dekem, dezānī, dečī, dełē, dāyā, heste

Voice

Act
- AUX-Fin: e, te, ye, yā, čīye
- VERB-Fin: kut, bū, e, kird, Deydey, bē, bikeyn, hāt, būn, dekem

Pronouns, Determiners, Quantifiers

Reflex

Yes
- PART: xo

Person

1
- PRON: emin, mān, eme, Eminīš, im
- VERB-Fin: bikeyn, dekem, bigēřimewe, bikem, bikēšīn, binūīn, bičīn, dekēšim, denēyn, depirsim

2
- PRON: eto, t, Engo, etož, it, to
- VERB-Fin: Deydey, dezānī, dečī, heste, y, binūn, bī, hestēne, nādeyewe, ī

3
- AUX-Fin: e, te, ye, yā, čīye
- NOUN: melāyēk, pārčāne
- PRON: ī, y, yān, Ewīš, Bo, Ew, Ewāne, ewān, kes, kīhā
- VERB: kut, bū, kird, e, bē, hāt, būn, dełē, dāyā, hestā
- VERB-Fin: kut, bū, e, kird, bē, hāt, būn, dełē, dāyā, hestā

Other Features

ExtPos
- ADP
  - ADP: be, lē, le, l, we, b, bo, de, pē, d

Syntax

Auxiliary Verbs and Copula

This corpus uses 1 lemmas as copulas (cop). Examples: bûn.

This corpus uses 1 lemmas as auxiliaries (aux). Examples: bûn.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB-Fin--NOUN (17)
- VERB-Fin--PRON (24)

obj
- VERB-Fin--NOUN (30)

iobj
- VERB-Fin--NOUN (1)
- VERB-Fin--NOUN-ADP(be) (4)
- VERB-Fin--NOUN-ADP(e) (1)
- VERB-Fin--NOUN-ADP(le) (1)
- VERB-Fin--PRON (4)
- VERB-Fin--PRON-ADP(Le) (1)
- VERB-Fin--PRON-ADP(be) (2)
- VERB-Fin--PRON-ADP(bo) (1)
- VERB-Fin--PRON-ADP(lē) (5)
- VERB-Fin--PRON-ADP(pē) (2)

Verbs with Reflexive Core Objects

This corpus contains 1 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: kirdin xo

Relations Overview

This corpus uses 3 relation subtypes: acl:relcl, compound:lvc, nmod:poss
The following 2 main types are not used alone, they are always subtyped: acl, compound
The following 16 relation types are not used in this corpus at all: csubj, ccomp, xcomp, expl, advcl, mark, appos, amod, clf, fixed, flat, list, orphan, goeswith, reparandum, dep