UD Brahui Kholum
Language: Brahui (code: brh)
Family: Dravidian
This treebank has been part of Universal Dependencies since the UD v2.18 release.
The following people have contributed to making this treebank part of UD: Muhammad Afzal, Luigi Talamo, Helena Vaz, Annemarie Verkerk.
Repository: UD_Brahui-Kholum
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-SA 4.0
Genre: fiction, news
Questions, comments? General annotation questions (either Brahui-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [annemarie • verkerk (æt) uni-saarland • de]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
The Kholum treebank is a manually annotated corpus in Brahui.
It contains 52 sentences of a short story called “Grains of Wheat” from the book “Brahui Texts” by Liaquat Ali (Latin script) and 12 sentences from a news article from the Balochistan Post (Arabic Script: https://tbpbrahui.com/2025/10/72017/). The data has been annotated according to Universal Dependencies guidelines.
The corpus is not split as there are not enough sentences for multiple splits:
| Split | Number of sentences |
|---|---|
| Test | 12 (Insaf na Khon) + 52 (Grains of Wheat) |
Annotation follows the Universal Dependencies v2 guidelines for tokenization, part-of-speech tags, and dependency relations.
The news article was collected manually from the news article
Acknowledgments
The treebank was annotated by Muhammad Afzal. Supervision and revision by Luigi Talamo, Helena Vaz and Annemarie Verkerk.
References
In preparation
Statistics of UD Brahui Kholum
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Aspect – Case – Mood – Number – NumType – Person – Polarity – Poss – PronType – Reflex – Tense – VerbForm
Relations
acl – advcl – advmod – amod – appos – aux – case – cc – ccomp – compound – conj – det – discourse – flat – iobj – mark – nmod – nmod:poss – nsubj – nummod – obj – obl – obl:tmod – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 64 sentences and 819 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 11 types of words that contain both letters and punctuation. Examples: بسنے۔, خننگپک۔, متور۔, مسنے،, منتو۔, کتو۔, کرو۔, کرینے۔, کرے،, کسفنگانو۔, کسفیسس۔
Morphology
Tags
- This corpus uses 13 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: PART, INTJ, SYM, X
- This corpus contains 14 lemmas tagged as pronouns (PRON): Asiṭ, Aṛdosar, arāṛe, asielo, asit, elo, jind, o, otā, ta, tenā, tā, الس, ہرا
- This corpus contains 24 lemmas tagged as determiners (DET): Aṛtomā, Dā, asi, aṛdosar, aṛtom, don, har, haṛtom, o, pen, te, آ, اس, اندا, او, اوفتا, ایلو, تینا, تے, دا, دنانگا, نا, ہر, ہچ
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: o
- This corpus contains 2 lemmas tagged as auxiliaries (AUX): as, mare
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: mare
- There are 3 (de)verbal forms:
- Conv
- VERB: ɣurcisa, karesa, suriffisa, tirisa, xalisa
- Fin
- AUX: as, assaka, assur, mare
- VERB: kare, mas, tammār, karer, massur, xalk, مننگ, کریسہ, کسفسہ, Kān
- Inf
- VERB: sāmbingaṭī, tiningaṭī, conḍokā, damkaššingkin, dudengaṭ, hunningaṭī, kanningaṭī, suriffingaṭī, vāj, xanington
Nominal Features
- Plur
- AUX-Fin: assur
- DET: aṛtomā, o, Aṛtomāk, aṛdosarā, har, haṛtomā, te, اوفتا, تینا, تے
- NOUN: cunāk, bāngok, kucakkāk, kucakkātā, bāngote, cunātā, Cunātāe, cāpe, deto, dīdaɣāte
- PRON: O, otā, tā, Aṛdosar, asieloṛā, eloṛā, ote, ta, الس
- VERB-Conv: xalisa
- VERB-Fin: tammār, karer, massur, کسفسہ, Kān, aler, alkur, biṭera, carrefer, gidrefesur
- VERB-Inf: hunningaṭī, kanningaṭī, xanington, ɣurringaṭī
- Sing
- AUX-Fin: as, assaka, mare
- DET: asi, o, pen, نا, Dā, don, آ, اس, اونا, دا
- NOUN: bāngo, bāngoe, عدالت, cunās, iraɣe, iraɣnā, jang, kucak, xolumnā, bā
- PRON: tenā, o, ta, ہرا, Asiṭ, Onā, arāṛe, asiṭnā, elonā, jind
- PROPN: بلوچستان, حیات, اللہ, بلوچ, بلوچءِ, تربت, شادی, karīm, jān, آبسر
- VERB-Conv: karesa, suriffisa, tirisa
- VERB-Fin: kare, mas, xalk, bas, bing, biṭe, canḍā, cāe, el, es
- VERB-Inf: tiningaṭī, conḍokā, damkaššingkin, dudengaṭ, suriffingaṭī, vāj, ḍalingaṭī
- Acc
- NOUN: baniānas, bāe, bāngoā, dušmane, kucakkas, lixe, parraɣāte, sīnae, tāje, xumbaṭī
- PRON: jind
- PROPN: xudā
- Gen
- NOUN: cunātā, raīsnā, ballanā, iraɣnā, kukuṛātā, Kucakkātā, Lixanā, bāngotā, cunānā, duppanā
- PRON: tenā, Onā, asiṭnā, elonā, otā, ta, tā
- PROPN: karīm, xānnā
Degree and Polarity
- Neg
- VERB-Fin: خننگپک۔, متور۔, منتو۔, کتو۔
Verbal Features
- Imp
- VERB-Conv: karesa, suriffisa, tirisa, xalisa
- VERB-Fin: مننگ, کسفسہ, Kān, aler, cāe, kareka, karesus, sāmbin, tissura, بدل
- VERB-Inf: tiningaṭī, conḍokā, damkaššingkin, dudengaṭ, hunningaṭī, kanningaṭī, suriffingaṭī, xanington, ɣurringaṭī, ḍalingaṭī
- Perf
- AUX-Fin: as, assaka, assur, mare
- VERB-Fin: kare, mas, tammār, karer, massur, xalk, alkur, bas, bing, biṭe
- VERB-Inf: vāj
- Imp
- VERB-Fin: Kān, sāmbin, xalk
- Ind
- AUX-Fin: as, assaka, assur, mare
- VERB-Conv: karesa, suriffisa, tirisa, xalisa
- VERB-Fin: kare, mas, tammār, karer, massur, مننگ, کریسہ, کسفسہ, aler, alkur
- VERB-Inf: tiningaṭī, conḍokā, damkaššingkin, dudengaṭ, hunningaṭī, kanningaṭī, suriffingaṭī, vāj, xanington, ɣurringaṭī
- Past
- AUX: as, assaka, assur, mare
- AUX-Fin: as, assaka, assur, mare
- VERB: kare, bas, mas, tammār, avār, bašxā, karer, massur, sāmbokā, xalk
- VERB-Conv: karesa, suriffisa, tirisa
- VERB-Fin: kare, mas, tammār, karer, massur, xalk, aler, alkur, bas, bing
- VERB-Inf: conḍokā, damkaššingkin, hunningaṭī, kanningaṭī, suriffingaṭī, tiningaṭī, vāj, xanington, ḍalingaṭī
- Pres
- VERB-Conv: xalisa
- VERB-Fin: مننگ, کریسہ, کسفسہ, Kān, cāe, sāmbin, بدل, برجاءِ, خلیسہ, خننگپک۔
- VERB-Inf: dudengaṭ, tiningaṭī, ɣurringaṭī
Pronouns, Determiners, Quantifiers
- Art
- DET: asi, o, نا, te, آ, اس, تے
- Dem
- DET: اندا, ایلو, Dā, O, don, دا, دنانگا
- Ind
- DET: pen
- PRON: الس
- Prs
- DET: o, Aṛtomā, اوفتا, اونا, تینا
- PRON: o, tenā, otā, ta, tā, Asiṭ, Onā, asiṭnā, elonā, jind
- Rel
- PRON: ہرا, arāṛe
- Tot
- DET: aṛtomā, ہچو, Aṛtomāk, asi, aṛdosarā, har, haṛtomā, ہر
- PRON: Aṛdosar, O, asieloṛā, eloṛā
- Card
- NUM: asi, musi, 13, 2020, Asiṭ, irā, اسہ, دو, سدآ, مُسہ
- Ord
- NUM: musiṭṭamīko
- Yes
- PRON: tenā, Onā, asiṭnā, elonā, otā, ta, tā
- Yes
- PRON: jind
- 1
- VERB-Fin: Kān, sāmbin
- 3
- AUX-Fin: as, assaka, assur, mare
- PRON: o, tenā, ta, otā, tā, Asiṭ, Aṛdosar, Onā, arāṛe, asieloṛā
- VERB-Conv: karesa, suriffisa, tirisa, xalisa
- VERB-Fin: kare, mas, tammār, karer, massur, xalk, مننگ, کریسہ, کسفسہ, aler
- VERB-Inf: tiningaṭī, conḍokā, damkaššingkin, dudengaṭ, hunningaṭī, kanningaṭī, suriffingaṭī, vāj, xanington, ɣurringaṭī
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus uses 2 lemmas as auxiliaries (aux). Examples: as, mare.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Conv--NOUN (2)
- VERB-Fin--NOUN (44)
- VERB-Fin--NOUN-ADP(آتا) (1)
- VERB-Fin--PRON (10)
- VERB-Inf--NOUN (4)
- obj
- VERB-Conv--NOUN (2)
- VERB-Conv--NOUN-Acc (2)
- VERB-Fin--NOUN (53)
- VERB-Fin--NOUN-ADP(آتا) (1)
- VERB-Fin--NOUN-Acc (5)
- VERB-Fin--PRON (1)
- VERB-Inf--NOUN (6)
- iobj
- VERB-Fin--NOUN (2)