UD Hausa WesternAutogramm
Language: Hausa (code: ha)
Family: Afro-Asiatic
This treebank has been part of Universal Dependencies since the UD v2.17 release.
The following people have contributed to making this treebank part of UD: Bernard Caron.
Repository: UD_Hausa-WesternAutogramm
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-SA 4.0
Genre: fiction, nonfiction
Questions, comments? General annotation questions (either Hausa-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [bernard • l • caron (æt) gmail • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
This treebank contains data of Southern Autogramm, for the (Tibiri) Gobir dialect of Niger Republic (Western Hausa).
The Gobir dialect is a transition between the Standard (Kano) Hausa and the Sokoto dialect (see SUD_Hausa-NorthernAutogramm).
The treebank contains 775 sentences, 14,663 tokens and 12,007 words.
It is maintained in the SUD framework: SUD_Hausa-WesternAutogramm and converted automatically in UD.
Acknowledgments
The texts annotated in this treebank were dictated to Claude Gouffé in 1968 in Tibiri (Gobir, Niger Republic). The translation and morphosyntactic annotations are by Bernard Caron.
References
“BALDI, Sergio, Pawlak, Nina & Jibril Shuaibu Adamu. 2024. Hausa texts from Maradi (Niger) collected by Claude Gouffé in 1968 (with annotations in French). Studies in African Languages and Cultures Special Issue.”
Statistics of UD Hausa WesternAutogramm
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Case – Definite – Deixis – ExtPos – Gender – Mood – Number – PartType – Person – Polarity – PronType – Reflex – Tense – VerbForm
Relations
acl – acl:relcl – advcl – advcl:cleft – advmod – amod – appos – aux – case – cc – cc:preconj – ccomp – compound – compound:prt – conj – cop – csubj – dep – det – discourse – dislocated – fixed – flat – flat:name – iobj – mark – nmod – nmod:poss – nsubj – nummod – obj – obl – obl:arg – obl:mod – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 775 sentences, 13862 tokens and 13903 syntactic words.
- This corpus contains 2564 tokens (18%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 112 types of words that contain both letters and punctuation. Examples: s'oːhuwaː, s'àːrìnceː, ’yam, doːkì:, [yi], wa’àndà, zâː'a, ya’, bâː’a, iːdì:, duːc'ìː, kic'èn, moːc'èː, ta’, wa’ànnan, bà’à, bìː-ta-zâizâi, daːc'èː, du', s'àkiː, s'àːrìnci, koː’ìnaː, wuc'c'iyàl, du’, hac'iː, kàma’, tàbiː’àː, 'yaƙ, as', barì:, bàː-shèːkaràː-s'àye-ba, ha', hac'în, lì:mân, bàː-ni-bàː-ni, bâː'a, c'ìntoː, c’inkèː, duwàːs'uː, hùːde-hùːdeː, iyàːka', iyàːkas', lì:maːmìn, mas'oː, màntà-’uwa, màːmaːkì:, s'akad, s'awontà, s'aːbàh, s'aːmiyaː
- This corpus contains 41 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 27 types of multi-word tokens. Examples: gàrai, yam, sukài, akài, ankài, kà, sunkà, yaː, à, bakkì, bannì, bà’à, gàrînga, kakài, kaː, kài, màccênga, sai, shikài, shì, sunkài, takài, taː, wuri, zâː, ìm, ɗèːbai.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 29 word types tagged as particles (PART): ba, baːbù, bàː, bâː, dai, gàː, hwa, koː, kuma, kàu, kâ', kâk, kâm, kâs, kâsh, kât, kâu, kâɓ, kâɗ, kèːnan, kòː, maː, mài, màːsu, na, nèː, ta, zâː, àkwai
- This corpus contains 65 lemmas tagged as pronouns (PRON): a, cân, dukà, indà, ita, ka, kai, keː, ki, koːmiː, koːmì:, koːwacè, koːwanè, koːwaː, koː’ìnaː, kuː, kà, kânkà, kì, kù, mai, makà, makì, matà, mikì, minì, mishì, miː, mukà, munà, musù, mutà, muː, mâː, mì:, mù, nan, naːmù, naːsù, naːtà, ni, niː, nân, shi, shiː, shiːkèːnan, shì, su, suː, sù, ta, taːshì, taːsù, taːtà, tà, waddà, wandà, wani, waːnè, wa’àndà, wàddà, wàdà, wàː, yaddà, à
- This corpus contains 13 lemmas tagged as determiners (DET): can, dukà, koːdàwane, koːdàwanè, koːmiː, koːwacè, koːwanè, koːwàcè, nan, nân, wani, wata, waɗànnan
- Out of the above, 7 lemmas occurred sometimes as PRON and sometimes as DET: dukà, koːmiː, koːwacè, koːwanè, nan, nân, wani
- This corpus contains 6 lemmas tagged as auxiliaries (AUX): neː, nàː, yaː, yà, yâː, zâi
- There are 2 (de)verbal forms:
- Part
- VERB: zàmne, gàme, kwànce, làɓe, s'àis'àye, s'àye
- Vnoun
- VERB: yîː, tàhiyàː, zuwàː, tàhiyàːtai, zaman, shìgaː, sôntà, yîn, cêːwaː, ganiː
Nominal Features
- Fem
- ADJ: 'yaƙ, màccè, kwikwiyàː, ƙaramaː, ƙàramaː, ’yak, ’yam, hwarab, hwaram, màccên
- ADP: s'àkaːnintà, wajentà, wurintà
- AUX: tà, taː, tac, bâːta, tanàː, taz, tat, bàtà, tag, tay
- DET: wata, koːwacè, koːwàcè, wani
- NOUN: bùdurwaː, màccè, s'oːhuwaː, dàudawaː, màːtaːtai, hiːr̃a, màːtam, kwalbaː, dùbaːr̃àː, gòːdiyaː
- NUM: dubuː
- PART: ta
- PRON: ita, ta, tà, mutà, waddà, kì, matà, wancè, mikì, taːshì
- VERB: tàhiyàː, ràbuwaː, shìgaː, bugàːwaː, huːdèːwaː, tàhiyàttà, yìwuwaː, ɗiːbàm, cêːwaː, ɗiːbàttà
- VERB-Vnoun: tàhiyàː, shìgaː, cêːwaː, zôwwaː, tahoːwattà, tàhiyàttà, yôwwaː, zoːwaː, ƙaːraː
- Masc
- ADJ: ɗanyen, baƙiː, hwarin, hwariː, jàː, mùlmùlalleː, sauran, saːboː, ɗanyeː, bàbbam
- ADP: wuriːnai, gàr̃ai, wurinkà, wuriːnaː, s'akaːninkà
- AUX: shì, yaː, yac, bâːshi, shinàː, yat, neː, yay, kà, yab
- DET: wani, wânnam, koːdàwane, wânga, wânnan, wannàn
- NOUN: sarkiː, mùtun, maːlàm, maːlàmiː, maːgàniː, gidaː, ƙarhèː, gàːriː, doːkìː, sarmàyiː
- NUM: ɗàrîn
- PRON: shiː, shi, wandà, shì, mai, kai, waːnè, kà, makà, naːtà
- PROPN: bàhillaːcèː, ùbangijìː, bàhillaːcèn
- VERB: kwaːnaː, sôː, yîn, yîː, sôn, tàhiyàːtai, sôːnai, zuwàː, cîː, kiràn
- VERB-Vnoun: yîː, zuwàː, tàhiyàːtai, zaman, sôntà, yîn, ganiː, kwaːnaː, rèːnontà, sàyen
- Plur
- ADJ: ’yam, mayyaː, ƙanaːnàː, hwarhwarun, hwarhwaruː, saːbiː, ’yan
- AUX: sù, sunkà, neː, sunàː, sukà, bàsù, sun, kun, munkà, nèː
- DET: wa’ànnan, wasu, wa’ànga
- NOUN: maːtaː, ruwaː, sàmàːriː, hannuwàː, mutàːneː, abuːbuwàː, hwàːdàːwaː, ruwan, ɗiyan, ɗiyaː
- PART: màːsu, na
- PRON: suː, wa’àndà, sù, musù, muː, kù, mukà, mù, kuː, munà
- PROPN: hillàːniː, hàusàːwaː, abzinaːwaː, bar̃ar̃oːjì, buːzàːyeː, bàhaushèː, kyakkyataːwaː, tagaːmaːwaː, ùːddawaː, kac'inaːwan
- VERB-Part: s'àis'àye
- Sing
- AUX: nîː
- PART: mài
- Acc
- PRON: mù
- Gen
- PART: na, ta
- Cons
- ADJ: sauran, ɗanyen, hwarhwarun, hwarin, jàd, 'yaƙ, bàbbam, hwarab, hwaram, màccên
- NOUN: ɗan, màːtam, jìkintà, gidam, gidan, àbun, irìn, loːkàcin, maːgànin, ruwan
- NUM: biyun, ɗàrîn
- PART: na
- PROPN: bàhillaːcèn, kac'inaːwan
- VERB: yîn, sôn, kiràn, sôntà, yîm, zaman, jîn, shân, sôm, tàhiyàttà
- VERB-Vnoun: zaman, sôntà, yîn, rèːnontà, sàyen, sôn, aikìn, aikìnkì, aikìntà, bugùntà
- Def
- NOUN: maːgànîn, gàːrîn, saːyèn, hwaːrìn, irìn, ruwân, kàr̃hôn, lì:mân, lìːmân, ƙarhèn
- VERB: dakàm
- Spec
- DET: wani, wasu, wata
- PRON: wani
Degree and Polarity
- Neg
- AUX: bâːshi, bâːta, bài, bàtà, bâːsu, bâː’a, bà’à, bàn, bàsù, bàkà
- PART: ba, bâː, baːbù, kât, bàː, kâsh, kâ', kâk, kâm, kâs
Verbal Features
- Iter
- PART: ta
- Perf
- AUX: ankà, sunkà, yaː, yac, an, taː, tac, yat, yay, yab
- Prog
- AUX: bâːshi, sunàː, shinàː, bâːta, kà, nàː, tanàː, akà, anàː, bâːsu
- Pot
- AUX: sûː, nîː, shîː
- Sub
- AUX: à, shì, tà, sù, kà, ìn, kì, kù, mù, yà
- Fut
- AUX: zâːshi, zâː'a, zâːta, zâː, zâːka, zâːki, zâːsu, zâːmu, zâm, zâːa
Pronouns, Determiners, Quantifiers
- Dem
- ADV: nan, can, nân, nam
- DET: nan, wânnam, wa’ànnan, ga, wânga, wânnan, wa’ànga, can, wannàn
- PRON: wânnan, wa’ànnàn, wâncân
- Ind
- DET: wani, wasu, wata
- PRON: wani, koːmiː, koːwanè
- Prs
- PRON: naːtà, taːshì, taːsù, naːmù, naːsù, taːtà
- Rel
- PRON: wandà, waddà, wa’àndà, indà, yaddà
- Tot
- ADV: dus, duk
- DET: du', du’, dum, duk, dul, duw, duy, dus, dut, duz
- PRON: dus, dud, dum
- Yes
- PRON: kânkà
- 1
- AUX: ìn, naː, bàn, inàː, munkà, mun, munàː, nay, nis, zâːmu
- PRON: mun, niː, nì, minì, ni, muː, mukà, mù, munà
- 2
- ADP: s'akaːninkà
- AUX: kà, kì, kaː, kakà, kin, kanàː, kas, kaɗ, kac, kah
- PRON: kai, kì, kà, makà, mikì, keː, kèː, kù, ka, kuː
- VERB: sônkà
- 3
- ADP: gàr̃ai
- AUX: shì, tà, sù, sunkà, yaː, yac, bâːshi, sunàː, taː, shinàː
- NOUN: sàràkkunnànshì
- PRON: shiː, shi, su, ita, ta, tà, mutà, suː, shì, mai
- VERB: ganai, bùgai, shìgai, sàːmai, ɗàukai, ɗèːbai
- 4
- AUX: à, ankà, an, akà, anàː, zâː'a, bâː’a, bà’à, am, bâː'a
- PRON: a, à
Other Features
- Deixis
- Med
- PRON: wâncân
- ProxH
- ADV: nan, nam
- DET: nan, wânnam, wa’ànnan, wânnan
- PRON: wânnan
- ProxS
- ADV: nân
- DET: ga, wânga, wa’ànga, wannàn
- PRON: wa’ànnàn
- Remt
- ADV: can
- DET: can
- Med
- ExtPos
- ADV
- ADV: nan
- NOUN: hwaːrìn
- VERB-Part: zàmne, gàme, kwànce, làɓe, s'àis'àye, s'àye
- NOUN
- VERB: kwaːnaː, sôː, yîn, yîː, tàhiyàː, ràbuwaː, sôn, tàhiyàːtai, sôːnai, zuwàː
- VERB-Vnoun: yîː, tàhiyàː, zuwàː, tàhiyàːtai, zaman, shìgaː, sôntà, yîn, cêːwaː, ganiː
- PRON
- DET: dum
- ADV
- PartType
- Aspect
- PART: ta
- Case
- PART: na, ta
- Der
- PART: mài, màːsu
- Disc
- PART: kuma
- Foc
- PART: kèːnan, nèː
- Neg
- PART: ba, kât, bàː, kâsh, kâ', kâk, kâm, kâs, kâɓ, kâɗ
- Pred
- PART: àkwai, bâː, zâː, baːbù, gàː
- Top
- PART: kâu, kuma, dai, maː, hwa, koː, kàu, kòː, àkwai
- Aspect
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: neː.
- This corpus uses 5 lemmas as auxiliaries (aux). Examples: yaː, yà, nàː, zâi, yâː.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (253)
- VERB--PRON (17)
- VERB-Part--NOUN (1)
- VERB-Vnoun--NOUN (6)
- obj
- VERB--NOUN (548)
- VERB--NOUN-ADP(na) (2)
- VERB--NOUN-ADP(sai) (1)
- VERB--PRON (275)
- VERB-Vnoun--PRON (4)
- iobj
- VERB--NOUN (19)
- VERB--PRON (151)
Relations Overview
- This corpus uses 8 relation subtypes: acl:relcl, advcl:cleft, cc:preconj, compound:prt, flat:name, nmod:poss, obl:arg, obl:mod
- The following 5 relation types are not used in this corpus at all: expl, clf, list, orphan, goeswith