home edit page issue tracker

This page pertains to UD version 2.

UD for Balochi

Balochi is a dialect continuum and until recently, the language was rarely written, so there does not seem to be a written standard with enough prestige to prevail on the vast territory where Balochi is spoken. One standard was proposed by Jahani (2019); note however that the texts in our data follow a different orthography. Unless specified otherwise, our data represent southeastern (Pakistani) Balochi.

Tokenization and Word Segmentation


Some morphemes that are treated as bound morphemes in the literature are in fact written as separate words under the orthography employed in our data. This applies both to the case suffixes of nouns and to the conjugation suffixes of verbs.

Nominal Features

There is no grammatically relevant gender.

According to Jahani and Korn (2009) p. 652, Balochi nouns have five cases, termed direct, oblique, object, genitive, and vocative. We map the first three cases to other names in the UD terminology. Under the orthography used in our data, case suffixes are written as separate words, they are thus analyzed as postpositions (ADP). The Case feature is annotated on the postposition that contributes the case, not on the noun itself.

The direct case roughly corresponds to the nominative in UD. It is used for the subject of all intransitive verbs and of transitive verbs in the present and future. Balochi has split ergativity like Indian languages, hence transitive verbs in the past tense have the ergative alignment, meaning that the object rather than the subject has this case form there. It is the simple uninflected noun. In our orthography it means that there is no postposition, hence Case=Nom is not annotated anywhere.

The oblique case is marked by the postposition ءَ ‘a. It is used as the accusative in the present and future, and as the ergative subject in the past tense. It is also placed between the noun and some more specific postpositions. We annotate it Case=Erg.

In ditransitive clauses, the object case marks the recipient, i.e., it corresponds to the dative (Case=Dat). Its morpheme is را and it may be combined with the oblique morpheme ءَ ‘a.

The genitive morpheme is ءِ ‘i and it is also written separately. We annotate it Case=Gen.

Vocative is unmarked in singular.

Nominal words can appear in two Number forms, singular (Sing) and plural (Plur). However, the number inflection is fused with the case inflection, that is, plural marking would be part of the case postposition, and there is no number distinction in the direct (nominative) case.


Personal pronouns exist in the first and the second person. Distal demonstratives are used instead of personal pronouns in the third person. The reflexive pronoun is wat.


Possessive pronouns are generally the personal pronouns with the genitive suffix ; but unlike nouns, they are written together with the suffix as one word. We treat them as distinct lexemes with their own lemma and with the Poss=Yes feature, not as genitive forms of the non-possessive personal pronouns. The forms ending in are used attributively; there are also predicative forms with an additional -g. TODO: Is there a feature we can use to distinguish the predicative form?

At least for the distal demonstrative, the genitive/possessive form is also used before the oblique case marker. For example, آئی ءَ ā’ī ‘a (áiá) is the oblique/accusative/ergative case; آئی ءَ را ā’ī ‘a rā (áiárá) is the object/dative case of “that”.

آئیā'īhis/her/its/their/of that/of thoseDeixis=Remt|Poss=Yes|PronType=Dem
وتیwatīone's ownPoss=Yes|PronType=Prs|Reflex=Yes

Like in English, the reciprocal pronoun is composed of two words. TODO: What to do with it? Do the words occur also independently?

یکے دومیyakē dōmīeach otherPronType=Rcp

Interrogative pronouns.


Indefinite article??? (At least that was the gloss assigned by the Balochi teacher.) It would apply to the preceding nominal. یے

Verbal Features

The conjugation suffixes of Balochi verbs come out as auxiliaries that follow the main verb. Example:

کندگkandagto laugh / laughingVerbForm=Inf
من کندگا آںman kandagā āñI am laughingThe auxiliary is Number=Sing|Person=1. The main verb should probably be some non-finite form, maybe a participle. And maybe a progressive participle (I saw a similar form glossed as "progressive aspect of verb".)
تو کندگا ئےtō kandagā ayyou (Sing) are laughing
آ کندگا اِنتā kandagā inthe is laughing
ما کندگا اِنmā kandagā inwe are laughing
شُما کندگا اِتšumā kandagā ityou (Plur) are laughing
آ کندگا اَنتā kandagā antthey are laughing
تو کند اِتtō kand ityou (Sing) laughed
تو کند اِتگtō kand itagyou (Sing) have laughed

The infinitive can be used and inflected as a verbal noun.

The auxiliary forms are similar or identical to the copula which would be used with non-verbal predicates.

Present-tense auxiliaries from the data: اِنت int (3rd person Sing; this form is also the copula “is”) اَنت ant (3rd person Plur; but the context could have been also past rather than present)

Past-tense auxiliaries from the data: اِت it (3rd person Sing) کت kt (3rd person Sing) کُت kut (is it the same as kt or not?) جت jt (perhaps this is not auxiliary? It was in the causative sentence.)


Besides the three case morphemes that were mentioned above (and that are considered mere suffixes by some authors), there are also “ordinary” adpositions.

چہčhfrom (Case=Abl)?

Coordinating Conjunctions



The negative particle نہ nah.

Particle at the end of the sentence: ئے ē (perhaps a question particle? or is it in fact the auxiliary ay, 2nd person singular, see above?) ئِے yie (the same or not?)



Instruction: Specify any unused tags. Explain what words are tagged as PART. Describe how the AUX-VERB and DET-PRON distinctions are drawn, and specify whether there are (de)verbal forms tagged as ADJ, ADV or NOUN. Include links to language-specific tag definitions if any.



Instruction: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.



Instruction: Give criteria for identifying core arguments (subjects and objects), and describe the range of copula constructions in nonverbal clauses. List all subtype relations used. Include links to language-specific relations definitions if any.


There are N Balochi UD treebanks:

Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and from the data in the latest release. Link to the respective *-index.html page in the treebanks folder, using the language code and the treebank code in the file name.