UD Hausa WesternAutogramm
Language: Hausa (code: ha)
Family: Afro-Asiatic
This treebank has been part of Universal Dependencies since the UD v2.17 release.
The following people have contributed to making this treebank part of UD: Bernard Caron.
Repository: UD_Hausa-WesternAutogramm
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: fiction, nonfiction, spoken
Questions, comments? General annotation questions (either Hausa-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [bernard • l • caron (æt) gmail • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
This treebank contains data of Southern Autogramm, for the (Tibiri) Gobir dialect of Niger Republic (Western Hausa).
The Gobir dialect is a transition between the Standard (Kano) Hausa and the Sokoto dialect (see SUD_Hausa-NorthernAutogramm).
The treebank contains 775 sentences, 14,663 tokens and 12,007 words.
It is maintained in the SUD framework: SUD_Hausa-WesternAutogramm and converted automatically in UD.
Acknowledgments
The texts annotated in this treebank were dictated to Claude Gouffé in 1968 in Tibiri (Gobir, Niger Republic). The translation and morphosyntactic annotations are by Bernard Caron.
References
“BALDI, Sergio, Pawlak, Nina & Jibril Shuaibu Adamu. 2024. Hausa texts from Maradi (Niger) collected by Claude Gouffé in 1968 (with annotations in French). Studies in African Languages and Cultures Special Issue.”
Statistics of UD Hausa WesternAutogramm
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Case – Definite – Deixis – ExtPos – Gender – Number – PartType – Person – Polarity – PronType – Tense
Relations
acl – acl:relcl – advcl – advcl:cleft – advmod – amod – appos – aux – case – cc – cc:preconj – ccomp – compound – compound:prt – conj – cop – csubj – dep – det – discourse – dislocated – fixed – flat – flat:name – iobj – mark – nmod – nsubj – nummod – obj – obl – obl:arg – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 775 sentences, 13862 tokens and 13888 syntactic words.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 111 types of words that contain both letters and punctuation. Examples: s'oːhuwaː, s'àːrìnceː, ’yam, doːkì:, wa’àndà, zâː'a, ya’, bâː’a, iːdì:, duːc'ìː, kic'èn, moːc'èː, ta’, wa’ànnan, bà’à, bìː-ta-zâizâi, daːc'èː, du', s'àkiː, s'àːrìnci, koː’ìnaː, wuc'c'iyàl, du’, hac'iː, kàma’, tàbiː’àː, 'yaƙ, as', barì:, bàː-shèːkaràː-s'àye-ba, ha', hac'în, lì:mân, bàː-ni-bàː-ni, bâː'a, c'ìntoː, c’inkèː, duwàːs'uː, hùːde-hùːdeː, iyàːka', iyàːkas', lì:maːmìn, mas'oː, màntà-’uwa, màːmaːkì:, s'akad, s'awontà, s'aːbàh, s'aːmiyaː, s'uyèn
- This corpus contains 26 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 19 types of multi-word tokens. Examples: gàrai, sukài, akài, ankài, bakkì, bannì, gàrînga, kakài, kài, màccênga, sai, shikài, sunkà, sunkài, takài, wuri, yam, zâː, ɗèːbai.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 25 word types tagged as particles (PART): ba, baːbù, bàː, bâː, dai, gàː, hwa, koː, kuma, kàu, kâ', kâk, kâm, kâs, kâsh, kât, kâu, kâɓ, kâɗ, kèːnan, kòː, maː, ta, zâː, àkwai
- This corpus contains 64 lemmas tagged as pronouns (PRON): a, dukà, indà, ita, ka, kai, keː, ki, koːmiː, koːmì:, koːwacè, koːwanè, koːwaː, koː’ìnaː, kuː, kà, kì, kù, mai, makà, makì, matà, mikì, minì, mishì, miː, mukà, munà, musù, mutà, muː, mâː, mì:, mù, naːmù, naːsù, naːtà, ni, niː, shi, shiː, shiːkèːnan, shì, su, suː, sù, ta, taːshì, taːsù, taːtà, tà, waddà, wandà, wani, waːnè, wa’àndà, wa’ànnàn, wàddà, wàdà, wàː, wâncân, wânnan, yaddà, à
- This corpus contains 20 lemmas tagged as determiners (DET): can, duk, dukà, ga, ganiː, koːdàwane, koːdàwanè, koːmiː, koːwacè, koːwanè, koːwàcè, nan, wani, wannàn, wata, waɗànnan, wa’ànga, wa’ànnan, wânga, wânnan
- Out of the above, 6 lemmas occurred sometimes as PRON and sometimes as DET: dukà, koːmiː, koːwacè, koːwanè, wani, wânnan
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
- This corpus does not use the VerbForm feature.
Nominal Features
- Fem
- ADJ: 'yaƙ, màccè, kwikwiyàː, ƙaramaː, ƙàramaː, ’yak, ’yam, hwarab, hwaram, màccên
- ADP: s'àkaːnintà, wajentà, wurinkì, wurintà
- AUX: tà, taː, tac, bâːta, tanàː, taz, tat, bàtà, tag, tay
- DET: wata, koːwacè, koːwàcè, wani
- NOUN: bùdurwaː, màccè, s'oːhuwaː, dàudawaː, màːtam, màːtaːtai, jìkintà, hiːr̃a, kwalbaː, dùbaːr̃àː
- NUM: dubuː
- PRON: ita, ta, tà, mutà, waddà, kì, matà, mikì, naːtà, wancè
- VERB: tàhiyàː, ràbuwaː, shìgaː, bugàːwaː, huːdèːwaː, tàhiyàttà, yìwuwaː, ɗiːbàm, cêːwaː, ɗiːbàttà
- Masc
- ADJ: ɗanyen, baƙiː, hwarin, hwariː, jàː, mùlmùlalleː, saːboː, ɗanyeː, bàbbam, saːbon
- ADP: mài, gàr̃ai, wuriːnai, s'akaːninkà, wurinkà
- AUX: shì, yaː, yac, bâːshi, shinàː, yat, neː, yay, kà, yab
- DET: wani, wânnam, koːdàwane, wânga, wânnan, wannàn
- NOUN: mùtun, sarkiː, maːlàm, maːlàmiː, maːgàniː, gidaː, maːgànîn, ƙarhèː, gàːriː, doːkìː
- NUM: ɗàrîn
- PRON: shiː, shi, wandà, shì, mai, kai, kà, makà, wani, taːshì
- PROPN: bàhillaːcèː, ùbangijìː, bàhillaːcèn
- VERB: kwaːnaː, sôː, yîn, yîː, sôn, tàhiyàːtai, sôːnai, zuwàː, cîː, kiràn
- Plur
- ADJ: ’yam, mayyaː, ƙanaːnàː, hwarhwarun, hwarhwaruː, saːbiː, ’yan
- ADP: màːsu, dabràssù, s'àkaːninsù
- AUX: sù, sunkà, neː, sunàː, sukà, bàsù, nèː, sun, kun, munkà
- DET: wasu, wa’ànnan, wa’ànga
- NOUN: maːtaː, ruwaː, sàmàːriː, hannuwàː, ruwan, ɗiyan, mutàːneː, abuːbuwàː, hwàːdàːwaː, gidansù
- PRON: suː, wa’àndà, sù, musù, muː, kù, mukà, taːsù, kuː, munà
- PROPN: hillàːniː, hàusàːwaː, abzinaːwaː, bar̃ar̃oːjì, buːzàːyeː, bàhaushèː, kyakkyataːwaː, tagaːmaːwaː, ùːddawaː, kac'inaːwan
- VERB: s'àis'àye, cêːwaːssù, cînsù, gùrguràssù, kirànsù, kwancìnsù, zamansù
- Gen
- ADP: wurinkà
- NOUN: jìkinkà, sàràkkuwakkà, bàːkinkà, gidankà, màːtakkà, bùtuːnai, gòːshinkà, kânkà, zaːzakkà, àkàihunkà
- Cons
- ADJ: sauran, ɗanyen, hwarhwarun, hwarin, jàd, 'yaƙ, bàbbam, hwarab, hwaram, màccên
- ADP: wurin, wurim, gàban, kàmaɗ, s'akad, dabràssù, wajentà
- ADV: nân
- NOUN: ɗan, màːtam, irìn, jìkintà, gidam, gidan, àbun, loːkàcin, maːgànin, ruwan
- NUM: biyun, ɗàrîn
- PROPN: bàhillaːcèn, kac'inaːwan
- VERB: yîn, sôn, kiràn, sôntà, yîm, zaman, jîn, shân, sôm, tàhiyàttà
- Def
- ADV: nan
- DET: nan
- NOUN: maːgànîn, gàːrîn, saːyèn, hwaːrìn, kàr̃hôn, lì:mân, lìːmân, ruwân, ƙarhèn, ɗiyân
- VERB: dakàm
- Ind
- NOUN: sarkiː, mùtun, maːlàm, maːlàmiː, maːgàniː, maːtaː, bùdurwaː, zuːgàl, gidaː, ƙarhèː
- PRON: wancè, waːnè
- Spec
- DET: wani
- PRON: wani
Degree and Polarity
- Neg
- AUX: bâːshi, bâːta, bài, bàtà, bâːsu, bâː’a, bà’à, bàn, bàsù, bàkà
- PART: ba, bâː, baːbù, kât, bàː, kâsh, kâ', kâk, kâm, kâs
Verbal Features
- Aor
- AUX: à, shì, tà, sù, kà, ìn, kì, kù, mù, ìm
- Iter
- PART: ta
- Perf
- AUX: yaː, an, taː, kaː, kin, naː, sun, am, kun, mun
- PerfBkg
- AUX: ankà, sunkà, yac, tac, yat, yay, yab, yak, yag, yaz
- PerfNeg
- AUX: bài, bàtà, bà’à, bàn, bàsù, bàkà
- Prog
- AUX: sunàː, shinàː, anàː, nàː, tanàː, yanàː, kanàː, inàː, kinàː
- ProgBkg
- AUX: kà, akà, shikà, takà, sukà, kakà, kikà, yakèː
- ProgNeg
- AUX: bâːshi, bâːta, bâːsu, bâː’a, bâːka, bâːki, bâː'a, bâːshì
- Fut
- AUX: zâːshi, zâː'a, zâːta, zâː, zâːka, zâːki, zâːsu, zâːmu, zâm, zâːa
- Pred
- AUX: sûː, nîː, shîː
Pronouns, Determiners, Quantifiers
- Dem
- ADV: nan, can, nân, nam
- DET: nan, wânnam, wa’ànnan, ga, wânga, wânnan, wa’ànga, can, wannàn
- PRON: wânnan, wa’ànnàn, wâncân
- Ind
- PRON: koːmiː, koːwanè
- Rel
- PRON: wandà, waddà, wa’àndà, indà, yaddà
- Tot
- PRON: dum
- 1
- ADP: wuriːna
- AUX: ìn, naː, bàn, inàː, munkà, mun, munàː, mù, nay, nis
- NOUN: gidaːna, kissàmmù, kâina, màːtaːta, sàːƙoːna, ìdàːnuːnaː
- PRON: mun, niː, nì, minì, ni, muː, mukà, munà, mù, naːmù
- VERB: ganiːna
- 2
- ADP: s'akaːninkà, wurinkà, wurinkì
- AUX: kà, kì, kaː, kakà, kin, kanàː, kas, kaɗ, kac, kah
- NOUN: jìkinkà, sàràkkuwakkà, bàːkinkà, gidankà, kânkà, màːtakkà, gòːshinkà, kissàkkù, laːdakkì, zaːzakkà
- PRON: kai, kì, kà, makà, mikì, keː, kèː, kù, ka, kuː
- VERB: aikìnkì, ganinkì, kirànkì, sônkà, sônkì
- 3
- ADP: dabràssù, gàr̃ai, wuriːnai, s'àkaːninsù, s'àkaːnintà, wajentà, wurintà
- AUX: shì, tà, sù, sunkà, yaː, yac, bâːshi, sunàː, shinàː, taː
- NOUN: màːtaːtai, jìkintà, wuriːnai, doːkìːnai, mijìntà, gidaːnai, uwattà, àboːkiːnai, sàràkkuwaːtai, gidansù
- PRON: shiː, shi, su, ita, ta, tà, mutà, suː, shì, mai
- VERB: tàhiyàːtai, sôːnai, sôntà, ɗiːbàːtai, tàhiyàttà, ganai, ɗiːbàttà, ganiːnai, kiràntà, rèːnontà
- 4
- AUX: à, ankà, an, anàː, akà, zâː'a, bâː’a, bà’à, am, anà
- PRON: a, à
Other Features
- Deixis
- Prox
- ADV: nân
- DET: wânnam, ga, wânga, wânnan, wa’ànga, wannàn
- PRON: wânnan, wa’ànnàn
- Remt
- ADV: can, nam
- DET: wa’ànnan, can, wânnam
- PRON: wâncân
- Prox
- ExtPos
- ADV
- ADV: nan
- NOUN: hwaːrìn
- NOUN
- VERB: kwaːnaː, sôː, yîn, yîː, tàhiyàː, ràbuwaː, sôn, tàhiyàːtai, sôːnai, zuwàː
- PRON
- DET: dum
- ADV
- PartType
- Adv
- PART: ta
- Disc
- PART: kuma
- Foc
- PART: kèːnan
- Neg
- PART: ba, bâː, baːbù, kât, bàː, kâsh, kâ', kâk, kâm, kâs
- Pred
- PART: àkwai, zâː, gàː
- Top
- PART: kâu, kuma, dai, maː, hwa, koː, kàu, kòː, àkwai
- Adv
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (256)
- VERB--NOUN-ADP(mài) (2)
- VERB--NOUN-Gen (2)
- VERB--PRON (17)
- obj
- VERB--NOUN (538)
- VERB--NOUN-ADP(mài) (4)
- VERB--NOUN-ADP(na/ta) (2)
- VERB--NOUN-ADP(sai) (1)
- VERB--NOUN-Gen (7)
- VERB--PRON (280)
- iobj
- VERB--NOUN (19)
- VERB--PRON (136)
Relations Overview
- This corpus uses 6 relation subtypes: acl:relcl, advcl:cleft, cc:preconj, compound:prt, flat:name, obl:arg
- The following 5 relation types are not used in this corpus at all: expl, clf, list, orphan, goeswith