UD Zaar Autogramm
Language: Zaar (code: say
)
Family: Afro-Asiatic, West Chadic
This treebank has been part of Universal Dependencies since the UD v2.11 release.
The following people have contributed to making this treebank part of UD: Sylvain Kahane, Bruno Guillaume, Bernard Caron, Katharine Jiang.
Repository: UD_Zaar-Autogramm
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.12
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Zaar-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [bruno • guillaume (æt) inria • fr]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
A Universal Dependencies corpus for Zaar (aka Sayanci), a member of the Chadic branch of the Afro-Asiatic phylum. The language is mainly spoken by about 200,000 speakers in the Bogoro and Tafawa Balewa local governments of Bauchi State, Nigeria.
The treebank is an automatic conversion of the SUD_Zaar-Autogramm, which was extracted from Bernard Caron’s corpus in Elan format (https://corpafroas.huma-num.fr/Archives/corpus.php).
Sentences are annotated with the following metadata:
sent_id
(which indicates the source file and the segmentation identifier in the source file)speaker_id
(which identifies the turn of speech)sound_url
(which enables playback of the audio recording)seṅt_timecode
(which enables playback of the sentence)text
(lexical tokenization)text_ortho
(original transcription of the audio recording)text_en
(English interpretation)
Acknowledgments
This treebank was produced as part of the Autogramm ANR project. With special thanks to Bruno Guillaume for the conversion from SUD to UD, Sylvain Kahane and Christian Chanard. A special tribute must be paid to Marvellous S. Davan, who transcribed and translated the Zaar Corpus, and met an untimely death in Bauchi at the age of 40.
Statistics of UD Zaar Autogramm
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Definite – Deixis – Foreign – Mood – Number – PartType – Person – Polarity – Poss – PronType – Reflex – Tense – VerbForm – VerbType – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – case – cc – cc:preconj – ccomp – compound – compound:prt – compound:redup – conj – csubj – dep – det – discourse – dislocated – fixed – flat – flat:foreign – flat:name – iobj – mark – nmod – nmod:poss – nsubj – nummod – obj – obl – obl:agent – obl:arg – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 817 sentences, 7618 tokens and 7625 syntactic words.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 9 types of words that contain both letters and punctuation. Examples: Hap#, Ndà#, Sau#, d#, kú#, lyá:, m#, màː#, ám#
- This corpus contains 7 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 7 types of multi-word tokens. Examples: =tə+n, kafa, kap, mí, àngwa, ʧin, ʧíː.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 85 word types tagged as particles (PART): XX, aː, baːbù, bàː, báː, daː, deː, dàː, eː, fáː, fâː, gáːrá, gən, gəndí, gəní, gənín, gə̂n, hôː, hŋ́, hə́ŋ, hḿ, kweː, kwêy, kwǎː, kàm, káɗá, kóː, kóːdàː, kúm, kúmá, kən, kəndí, kənín, máː, məníː, mə́n, n, ngən, ni, nì, nə, oː, tá, tòː, tóː, tôː, w, wéy, yôː, àː, âː, òː, ěːn, ń, ŋ, ŋaː, ŋâː, ŋǎː, ŋǎːn, ŋəndí, ŋ́, ǎ, ǎn, ǎː, ǎːn, ǐn, ǒːy, ɗ, ɗa, ɗi, ɗà, ɗì, ɗǎ, ɗǐ, ən, ə̌n, ə̌níː, ə̌ŋ, ɣən, ɣəndá, ɣəndí, ɣəní, ɣənín, ɣə́n, ɣə̂n
- This corpus contains 49 lemmas tagged as pronouns (PRON): =gə̀tn, =kí, =kə, =mí, =mə, =tə, =waː, =waːn, =waːsən, =wopm, =wos, =wàːsə̀ŋ, =wòs, =wôs, =âtn, =ʃí, dàːsóːɗa, dàːʃì, gwàː, gwàːm, gwàːsə̀n, gyòː, gyóːɗan, gyôː, gín, gíː, gòpm, gòs, gón, gə̀tn, ki, kyáːni, káy, kì, kóːníː, kóːnúː, mi, myàːní, myáːni, mì, níː, núː, yàːʃí, yáːn, yáːni, yóːɗan, ɗan, ʧi, ʧì
- This corpus contains 11 lemmas tagged as determiners (DET): XX, dúk, gyaː, gyôː, gíː, gòn, gón, kotá, sú, wannan, wón
- Out of the above, 3 lemmas occurred sometimes as PRON and sometimes as DET: gyôː, gíː, gón
- This corpus contains 23 lemmas tagged as auxiliaries (AUX): tə̀, wò, wòyi, yiː, yáː, yí, àː, àːtá, àːyí, á, ánáː, ánáːyáː, átáyiká, átâ, átâyi, átâyáː, áyí, áyǎː, ʧiká, ʧáː, ʧáːnaː, ʧáːyi, ʧíta
- Out of the above, 2 lemmas occurred sometimes as AUX and sometimes as VERB: yáː, ʧiká
- There are 3 (de)verbal forms:
- Fin
- VERB: ɬə́, mán, fi, súː, wul, yel, tu, yi, ɬə, fuː
- Inf
- VERB: ndáy, ʃíni
- Vnoun
- VERB: ŋálɣə́nì, yélɣə́nì, kápkə̂n, nátkə́nì, súːɣə̂n, tsə́tngə́n, ɗùɣə̀nì, ɬə́ɣə̂n, ʧíɣə̂n, bàɬkə̀nì
Nominal Features
- Plur
- ADJ: mərə́, məːri
- AUX: mə́, mə̀, má, myǎː, tə́, tá, myàː, tə̀, yǎː, tàː
- DET: gyaː, gyáː, gyǎː, sú
- NOUN: mə́n, guɗi, gerʃí, zàrsə̀, gút, kaɗanʃí, mərə́, mərə́m, mərə́n, məːrə́ŋsə
- PRON: =mí, =ʃí, ʧì, gòpm, =wôpm, mì, myàːní, =kí, =wàːsə̀n, =wòpm
- PROPN: Bàtùràːye
- VERB-Fin: kǐːr, kǐːríː, kǔːp, náːt, sǔn, kwáːn, kúːp, màːn, máːn, mə́ːʃíː
- VERB-Vnoun: yàːlɣə̀nì, ríːngə̂n, yáːlɣə̂n, yáːlɣə̂nín
- Sing
- AUX: wò, mə, á, àː, myáː, yáː, kə, ma, ʧáː, tə̀
- PRON: =tə, ɣáy, =m, =tə̀, gòs, myâːn, =ɣə, =tíː, =kə, =əm
- Cons
- ADJ: mərə́, ŋǎː
- NOUN: lǎː, məːrí, zǎːr, ɮǐː, dondə́, kətə́r, gút, awré, gàrí, kafá
- Ind
- NOUN: də̀nì, màːʃíni, námʧi, náɣɗi, sə̀kéːɗi, gə̀ɗə̌, tə́pi, vwàːy, vìːnì, ŋây
- Spec
- DET: wón, gón, gòn
- PRON: gón
Degree and Polarity
- Neg
- PART: bàː, hŋ́, n, ə̌n, ǐn, báː, baːbù, ŋ, ŋ́, ǎːn
Verbal Features
- Aor
- AUX: mə́, mə, á, kə, tə́, kə́, mə̀
- Conc
- AUX: myáːnaː
- Imp
- AUX: myáː, ʧáː, myǎː, átâyáː, kə̀tàyáː, mətáyáː, áyǎː, mənáːyáː, mə̀tàyáː, tə̀tàyáː
- ImpIter
- AUX: myàːyi, myáːyi
- Inch
- PART: ni, nì, n
- VERB-Fin: fin, fín, fîn, yin, yìn, ɬyan
- Iter
- AUX: yi, mayi, miː, mə̀yi, átâyi
- Perf
- AUX: àː, káː, máː, tàː, màː, àːtá, máːtá, máːyí, âː, màːtá
- Prog
- AUX: ʧǎː, myǎː, ʧàː, ʧìɣá, miɣá, kiɣá, myaː, mìɣá, mə̀tàyiɣá
- Res
- ADJ: vàrèʃíː
- ADV: ʧiɣə́y, ʧíɣə́y, ɗúːníː
- NOUN: gə̀ɗíː, lə́ɓíː, bóːlǐː, kuríː, lǎːy, kàlàːʧíː, kə́lâːʃíː, ləpíː, náɣɗêʃíː, náɣɗíː
- NUM: nàmbóŋə́y, nàmbóɲíː
- PART: məníː, kwêy, ə̌níː
- PRON: =tíː, =míː, =wôpíː, =gə̀tíː, =kə́y, =tə́y, =wàːsə̀ŋə́y, =wòpíː, =âtíː, =ʃíː
- PROPN: Kullây
- VERB-Fin: ɬǐː, ɬíː, ɬǎːy, laːtsə́y, ndáy, ndǎy, ɓâníː, mbûɗíː, fíː, kaɓíː
- VERB-Vnoun: mbútkə̂níː, ɬəɣə̂níː, ɬə̀ɣə̀níː
- Cnd
- AUX: yáː, kyáː, myàː, myáː, yǎː, kyàː, mǎː, myǎː
- Imp
- VERB-Fin: sǔn, fi, kaɓíː, kon, máːn, tá, ɲangás, ʤáː
- Irr
- AUX: mìː, mîː, kîː, tíː, kìː, míyí, mîtà, tíyí, ʧí, ʧíyí
- PART: dàː, kóːdàː
- Qot
- PART: wéy
- SCONJ: tu
- Sub
- AUX: mə̀, tə̀, àː
- Fut
- AUX: wò, má, ma, ka, tá, ká, mayi
- Imm
- AUX: áyǎː, máːyí, áyí, kíː, mìyǎː, míyàː, tìyǎː, àːyí
- Rec
- AUX: mənáːyáː, kənáː, mənáyáː, mənáː
- Rem
- AUX: átâ, mətá, kətá, tà, tə̀tà, mə̀tà, átâyáː, kə̀tàyáː, mətáyáː, tâ
- Cau
- VERB-Fin: ɬə̌ːr, ɬə̌ːríː, ɬə́ːr
Pronouns, Determiners, Quantifiers
- Dem
- ADV: ɗangəní, yáːwón, ɗáni, ɗân, ʤǎːn, ɗaɗân, ɗangənín, ɗán, ɗúːni, ɗûːn
- DET: gíː, XX
- PRON: gíː, gín
- VERB-Vnoun: mbútkə̂nín
- Int
- ADV: wuriː, gyóː, wuríː, wúr, yǎː, ìnáː, ɗòː, ɗôː
- DET: gyòː
- PART: kwǎː
- PRON: níː, nîː, gyòː, núː, nǐː
- Prs
- PRON: =tə, =mí, =ʃí, ɣáy, =m, =tə̀, gòs, myâːn, =ɣə, =tíː
- Rel
- ADV: yáddiyóːɗan, yandìyóːɗan, yándiyóːɗan, yə́ddà, ɗan
- PRON: yóːɗan, gyóːɗan, dàːsóːɗa
- SCONJ: ɗan, ɗa
- Yes
- PRON: gòs, =tn, gə̀tn, =wàːsə̀ŋ, =wôpm, =wôpíː, =wôs, gòʃíː, =tíː, =wàːsə̀n
- Yes
- PRON: =kí, =mí, =tə, =ɣə, =ʃí
- 1
- AUX: mə́, mə̀, mə, myáː, má, ma, myǎː, mətá, myàː, máː
- PRON: =mí, =m, myâːn, =əm, gòpm, myáːni, =wôpm, mì, myàːní, =tn
- 2
- AUX: kə, ka, káː, kyáː, kətá, kə́, àː, ká, kyàː, kə̀tàyáː
- PRON: =ɣə, =kə, =ɣə̀, =kí, gwàː, =ɣə́, =ɣə́n, kyáːni, =gə, =kə̀
- 3
- AUX: wò, á, tə̀, àː, yáː, ʧáː, tə́, tá, átâ, yǎː
- PRON: =tə, =ʃí, ɣáy, =tə̀, gòs, ʧì, =tíː, =wôs, gáy, gə̀tn
Other Features
- Deixis
- Med
- NOUN: ngə́ʃês, lǎːs, kúnês, dàlíːlês, dàtə́pês, màːʃínês, ɮěːs, awrês, dàːmuwâs, kaːyâs
- PRON: dàːʃès, =wôpès, gòpès
- PROPN: Dʒòʃès, Kaɗə́mês, Maláːrês
- VERB-Fin: kaɓêʃíː, làːtsêʃíː, yàːlès, ɗôːʃíː, ɬə̂ʃíː
- Prox
- ADV: ɗangənín
- NOUN: múrín, Bàtúːrén, dzàŋə́n, kàːsuwǎn, lǎːn, mərə́n, ngəʃín, yaːɬə́n, zàrsə́n
- PART: ɣənín, gənín, kənín, ěːn
- PRON: =ɣə́n, =wòpə́n, =wòpə̌n, =wòsə́n, =ètə́n, gín, gòpə́n, ɣáyín
- VERB-Fin: mánín
- VERB-Vnoun: yáːlɣə̂nín
- Remt
- ADJ: ngómdíː
- ADV: ɗáníː, gíː
- DET: gíː
- NOUN: Tákwâːrày, də̀níː, kúníː, màːʃíníː, ráːnábáwɗíː, ríːʤiyáy, turíː, wurɓéy, məːrêy, məːríː
- PRON: gíː, gòʃíː, =êtíː, =wôpíː, yáːníː
- PROPN: Abuʤéː, Kìmsə́y, Kímsə́y, Kímsə̂y, Súléy, Súːmíː
- VERB-Vnoun: ɬəɣə̂níː, ʧáːɣə̂níː
- Med
- Foreign
- Yes
- X: nan, shi, ba, a, kafin, OK, ke, tunda, wannan, ɗaya
- Yes
- PartType
- Emp
- PART: oː, òː, kwǎː, hôː, yôː, ǒːy
- Int
- PART: aː, ŋâː, ŋǎː, àː, eː, ŋaː, ǎː, ŋǎːn, ǎːn, ěːn
- Emp
- VerbType
- Cop
- PART: nə, ɣən, ɣəndí, ɣəní, kən, ɗa, gəndí, kəndí, gəní, ɣəndá
- Cop
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus uses 23 lemmas as auxiliaries (aux). Examples: á, wò, tə̀, àː, yáː, ʧáː, átâ, yí, átâyáː, ʧiká, àːtá, áyǎː, yiː, ánáːyáː, wòyi, àːyí, ánáː, áyí, ʧáːyi, átáyiká, átâyi, ʧáːnaː, ʧíta.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (66)
- VERB-Fin--PRON (2)
- VERB-Vnoun--PRON (1)
- obj
- VERB-Fin--NOUN (197)
- VERB-Fin--NOUN-ADP(ká)-ADP(teː) (1)
- VERB-Fin--NOUN-ADP(tu) (1)
- VERB-Fin--PRON (144)
- VERB-Fin--PRON-ADP(tu) (2)
- iobj
- VERB-Fin--PRON (34)
Verbs with Reflexive Core Objects
- This corpus contains 4 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: su =kí, su =mí, ɬiː =ʃí, ʤaː =ɣə
Relations Overview
- This corpus uses 9 relation subtypes: acl:relcl, cc:preconj, compound:prt, compound:redup, flat:foreign, flat:name, nmod:poss, obl:agent, obl:arg
- The following 6 relation types are not used in this corpus at all: expl, cop, clf, list, orphan, goeswith