UD Hausa EasternAutogramm
Language: Hausa (code: ha)
Family: Afro-Asiatic
This treebank has been part of Universal Dependencies since the UD v2.18 release.
The following people have contributed to making this treebank part of UD: Bernard Caron.
Repository: UD_Hausa-EasternAutogramm
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-SA 4.0
Genre: news
Questions, comments? General annotation questions (either Hausa-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [bernard • l • caron (æt) gmail • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
This treebank contains data of the Autogramm project, for the (Kano) Eastern dialect of Hausa, Nigeria.
These samples of Eastern Hausa are transcripts of Hausa news broadcasts of the BBC World Service.
The treebank is maintained in the SUD framework: SUD_Hausa-EasternAutogramm and converted automatically in UD.
The treebank contains 18 samples, 335 trees, 9,820 tokens and 9,032 words.
Acknowledgments
The samples are extracts from (Jaggar 1992), a Hausa textbook published by SOAS. The translations and annotations are by B. Caron.
References
Jaggar, Philip J. 1992. An advanced Hausa reader with grammatical notes and exercises. London: School of Oriental and African Studies, University of London.
Statistics of UD Hausa EasternAutogramm
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Case – Definite – Deixis – Evident – ExtPos – Foreign – Gender – Mood – Number – PartType – Person – Polarity – PronType – Reflex – Tense – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advcl:cleft – advmod – amod – appos – aux – case – cc – cc:preconj – ccomp – compound – compound:prt – conj – cop – csubj – dep – det – discourse – dislocated – fixed – flat – flat:foreign – flat:name – iobj – mark – nmod – nmod:appos – nmod:poss – nsubj – nsubj:outer – nummod – obj – obl – obl:arg – obl:mod – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 335 sentences and 9485 tokens.
- This corpus contains 719 tokens (8%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 85 types of words that contain both letters and punctuation. Examples: 'yan, zaː'à, zàngà-zangàr̃, G.F., zàngà-zangà, sàbà'in, farar̃-hùːlaː, jàma'àː, ma'àikàtan, irìː-irìː, jaːmi'àr̃, jàm'iyyàr̃, sàːce-sàːcên, bà'à, fàsà-ƙwàurin, kàɗe-kàɗe, r̃a'àyin, shàr̃i'àː, tàːshe-tàːshen, àr̃bà'in, 'yar̃, baːya-baːyan, gìne-gìne, jàm'ìyyuː, jàːmì'an, ma'àikàtaː, màce-màce, saː'oː'iː, sha'ànin, sàuye-sàuyên, wàːƙe-wàːƙe, wàːƙe-wàːƙensà, àl'amàr̃iː, àl'ummàr̃, 'yancìn, -sà, Al-Merrekh, Dove-Edwin, Ransome-Kuti, al'amur̃ànsù, al'amur̃àː, baː'à, baː'àː, baːya-baːya, bàlaː'ìn, bìl-Adamà, coːcì-coːcì, cànje-cànje, cê:waː, duku-duku
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 27 word types tagged as particles (PART): ba, baːbù, bà, bàː, bâː, cèː, dai, fa, gàː, kadà, kuma, kàm, kàːfìn, kèːnan, kùwa, mafìː, maràsaː, maː, mài, màːsu, na, neː, nèː, ta, tòː, wai, àkwai
- This corpus contains 41 lemmas tagged as pronouns (PRON): -sà, dukà, hakàn, indà, ita, ka, kai, koːmeː, koːwaː, koːwànneː, kânsu, kânsà, kânsù, makà, masà, matà, musù, mèː, mîn, naːkù, naːsà, naːsù, naːtà, ni, nân, shi, shidà, shiː, su, suː, sù, ta, taːsà, taːsù, wandà, wani, wasu, waːƙàː, wutaː, yàːyiː, yàːyîː
- This corpus contains 12 lemmas tagged as determiners (DET): can, dukà, koːwànè, koːwàɗànnè, nan, nân, wani, wasu, wata, yankìː, ɗin, ɗîn
- Out of the above, 4 lemmas occurred sometimes as PRON and sometimes as DET: dukà, nân, wani, wasu
- This corpus contains 7 lemmas tagged as auxiliaries (AUX): kàn, neː, nàː, yaː, yà, yâː, zâi
- There are 2 (de)verbal forms:
- Part
- VERB: fìye, gàme, ɗàuke, hàɗe, nànnàɗe, sàːle, sàːne, tsàre, tsàye, zàune
- Vnoun
- VERB: yîn, cêːwaː, rashìn, yîː, ganin, jîn, neːman, saːmùn, sôn, cîː
Nominal Features
- Fem
- ADJ: bàbbar̃, 'yar̃, hàɗaɗɗiyar̃, ƙwàːyaː, matuƙar̃, mayankar̃, muːgùwar̃, saːbuwar̃, shirgeːgìyar̃, ìsasshiyar̃
- ADP: cikinsà, jihàr̃
- AUX: ta, taː, tanàː, tà, ceː, cèː, takèː, zaːtà, baːtàː, bàtà
- DET: wata, wannàn, koːwàcè, wàccan
- NOUN: ƙasâr̃, duːniyàː, ƙasar̃, shèːkaràr̃, dòːkaː, gwamnatìn, ƙasaː, daːmaː, laːfiyàː, gwamnatì
- NUM: ɗàriː, miliyàn, ɗayantà
- PART: ta, cèː
- PRON: ita, waddà, wàddà, waːƙàː, matà, taːsù, ta, taːsà, wàccè, tà
- PROPN: Nìːjâr̃, Dòːkaː, Laːfiyàː, Dòːkâr̃, Mr, Nàːjeːr̃iyàr̃, Reinhard, Tèːkun, Zàyâr̃, Ƙasâr̃
- VERB: cèː, cêːwaː, mutuwàː, fàːruwaː, ɗaukàr̃, jìtuwar̃, shìga, taːràːwaː, amìncêwaː, bar̃
- VERB-Vnoun: cêːwaː, mutuwàː, fàːruwaː, ɗaukàr̃, jìtuwar̃, shìga, taːràːwaː, amìncêwaː, buːsàːwaː, fitôːwaː
- Masc
- ADJ: bàbban, irìː-irìː, yawàn, mùmmuːnan, tsoːhon, namijì, ìsasshen, ƙànƙanèː, ɗan, aman
- ADP: tsàkaːnin, kân
- AUX: ya, yaː, yanàː, yà, zâi, bài, yakè, yakèː, kaː, kà
- DET: wani, wannàn, koːwànè, dukkàn, wàncan, wànnan, yankìn
- NOUN: irìn, àmfàːniː, loːkàcîn, aikìː, bàːkiː, mùtûm, àbù, loːkàciː, suːnaː, sàndaː
- NUM: gùdaː, kashìː, sìttin, tàmàːnin
- PART: na
- PRON: shiː, wandà, shi, shì, masà, wani, wàndà, kânsà, koːwaː, koːwànneː
- PROPN: r̃àhoːtòn, Gwamnàn, Abengourou, Japananciː, Shùːgàbaː, Àgustàː, Landàn, Ministàːn, Yaːƙìn, Yàr̃iːmàn
- VERB: yîn, rashìn, yîː, ganin, jîn, neːman, saːmùn, sôn, cîː, goːyon
- VERB-Vnoun: yîn, rashìn, yîː, ganin, jîn, neːman, saːmùn, sôn, cîː, goːyon
- Plur
- ADJ: miyàːgun, maːtaː, ƙanaːnàː, mânyan, 'yan, mânyaː, ƙanaːnàn, ƙasàːshen, ꞌyan, bàbban
- ADP: tsàkaːnin, ƙàr̃ƙashin
- AUX: sukà, sun, sunàː, sù, sukèː, zaːsù, bàsù, sukàn, baːsàː, sukè
- DET: wasu, waɗànnân, wani, koːwàɗànnè
- NOUN: mutàːneː, mùsùlmiː, 'yan, ƙwaːyoːyiː, maːtaː, shèːkàruː, ƙasàːshen, Kir̃istoːciː, yâːraː, mutàneː
- PART: màːsu, na, maràsaː
- PRON: suː, waɗàndà, musù, kânsù, su, wasu, wàɗàndà, sù, naːtà, wani
- PROPN: Nàːjeːr̃iyàː, BBC, Japananciː, Mùsùlmiː, Nàjeːr̃iyàː, Aːyoːyin, Ogunbufunmi, Sudân, hukuːmoːmin, jàːmì'ân
- VERB: bayyànà, gìne-gìne, manèːmaː, amìncêːwar̃sù, goːyon, jìtuwar̃, kwaːɗunàː, masànaː, nànnàɗe
- VERB-Part: nànnàɗe
- VERB-Vnoun: amìncêːwar̃sù, goːyon, jìtuwar̃, kwaːɗunàː, masànaː
- Sing
- ADJ: Kir̃istàː
- AUX: na, inàː, naː, zaːkà, zân, ìn
- DET: wannàn
- PART: mài
- PRON: wannàn, mîn, ni
- Acc
- PRON: shi, shì, su, shîn, sù, ta, ni, tà
- Dat
- ADP: wà
- PRON: masà, musù, matà, makà, mîn
- Gen
- PART: na, ta
- PRON: naːsù, naːtà, taːsà, taːsù, -sà, naːkù, naːsà
- Nom
- PRON: shiː, suː, ita, kai
- Cons
- ADJ: bàbban, miyàːgun, yawàn, bàbbar̃, mânyan, 'yan, mùmmuːnan, cìkakken, hàɗaɗɗiyar̃, ìsasshen
- DET: dukkàn
- NOUN: 'yan, irìn, ƙasar̃, shèːkaràr̃, bir̃nin, ƙasàːshen, ƙungìyar̃, gidan, yawàn, gwamnatìn
- NUM: ɗayansà, ɗayansù, sìttin, tàmàːnin, ɗayantà
- PART: na
- PRON: dukkànsù, dukànsù, wasunsù
- PROPN: Nìːjâr̃, BBC, Aːyoːyin, Dòːkâr̃, Gwamnàn, Ministàːn, Mr, Ogunbufunmi, Reinhard, Sudân
- VERB: yîn, rashìn, ganin, jîn, neːman, saːmùn, sôn, goːyon, shirìn, cîn
- VERB-Vnoun: yîn, rashìn, ganin, jîn, neːman, saːmùn, sôn, goːyon, shirìn, cîn
- Def
- DET: ɗîn, yankìn
- NOUN: ƙasâr̃, loːkàcîn, àbîn, cùːtâr̃, mutàːnên, hùkuːmàr̃, mùtumìn, gwamnatìn, tafkìn, zàngà-zangàr̃
- NUM: ukùn, biyûn, huɗûn, shidàn
- PRON: shîn
- PROPN: r̃àhoːtòn, Baucîn, Kyanadàn, Landàn, Nàːjeːr̃iyàr̃, Yàr̃iːmàn, jàːmì'ân, Ƙasâr̃
- VERB: jiràn, kiràn, tsiːrân, ƙasâr̃, dafàn, hadìyêːwâr̃, jinîn, maːmàyêːwâr̃
- VERB-Vnoun: jiràn, kiràn, dafàn, hadìyêːwâr̃, maːmàyêːwâr̃
- Ind
- NOUN: hàr̃aːbàr̃, jàr̃iːdàː, laːfiyàː, mar̃tàniː, tsoːhuwaː
- Spec
- DET: wani, wasu, wata
- PRON: wani, wasu
Degree and Polarity
- Neg
- AUX: bài, bàsù, bà'à, baːsàː, baːtàː, bàtà, baː'à, baː'àː, baːkàː, baːyàː
- PART: ba, bàː, baːbù, bâː, bà, maràsaː, kadà
Verbal Features
- Hab
- AUX: sukàn, akàn, yakàn
- Iter
- PART: ta
- Perf
- AUX: ya, sukà, yaː, akà, sun, ta, an, taː, bài, bàsù
- Prog
- AUX: kèː, yanàː, sunàː, akèː, tanàː, anàː, yakè, yakèː, nàː, sukèː
- Pot
- AUX: tâː
- Sub
- AUX: à, sù, yà, tà, kà, ìn
- Fut
- AUX: zâi, zaː'à, zaːsù, zaːtà, zaːkà, zaːmù, zân
- Cau
- VERB: baːyar̃, gur̃faːnar̃, tabbatar̃, zubar̃, gudaːnar̃, tsai, waːyar̃, ƙaddamar̃, daːkatar̃, fid
- Nfh
- PART: wai
- SCONJ: wai
Pronouns, Determiners, Quantifiers
- Dem
- ADV: nan, can, nân
- DET: wannàn, nan, ɗîn, waɗànnân, nàn, nân, wàccan, wàncan, wànnan
- PRON: wannàn
- Ind
- DET: wani, wasu, wata
- PRON: wani, wasu, koːwànneː, koːmeː, koːwaː
- Int
- ADV: yàːyàː, ꞌyàːꞌyàː
- PRON: mèː, koːwànneː
- Prs
- PRON: shiː, suː, ita, shi, shì, masà, musù, kânsà, kânsù, su
- Rel
- ADV: yaddà, indà, yàːyîn, ìndà
- PRON: indà, waɗàndà, wandà, waddà, wàndà, wàddà, yàːyîn, wàɗàndà, wàccè
- Tot
- ADV: duk
- DET: duk, dukkàn
- PRON: duk, dukkànsù, dukànsù
- Yes
- PRON: kânsà, kânsù
- 1
- AUX: na, inàː, mukè, naː, zaːmù, zân, ìn
- PRON: mîn, ni
- 2
- AUX: kaː, kà, kanàː, kèː, zaːkà, baːkàː, kukèː
- PRON: ka, kai, makà
- 3
- AUX: ya, sukà, yaː, sun, ta, yanàː, sunàː, sù, taː, yà
- PRON: shiː, suː, ita, shi, shì, masà, musù, kânsà, kânsù, su
- 4
- AUX: akà, an, à, akèː, zaː'à, anàː, akè, bà'à, akàn, baː'à
Other Features
- Deixis
- ProxH
- ADV: nan
- DET: nan, wànnan
- ProxS
- ADV: nân
- DET: wannàn, waɗànnân, nàn, nân
- PRON: wannàn
- Remt
- ADV: can
- DET: wàccan, wàncan
- ProxH
- ExtPos
- ADP
- ADP: à, dàgà
- ADV: duk
- NOUN: danganèː
- ADV
- ADP: har̃, dà, koː, à
- ADV: duk, nan
- VERB: gàme, ɗàuke, jìm, hàɗe, nànnàɗe, sàːle, sàːne, tsàre, tsàye, zàune
- VERB-Part: gàme, ɗàuke, hàɗe, nànnàɗe, sàːle, sàːne, tsàre, tsàye, zàune
- NOUN
- VERB-Vnoun: yîn, cêːwaː, rashìn, yîː, ganin, jîn, neːman, saːmùn, sôn, cîː
- SCONJ
- ADP: baːyân, dà
- ADV: baːya, duk
- NOUN: danganèː
- SCONJ: koː, baːyân, tun
- ADP
- Foreign
- Yes
- X: carbon, dioxide, ìlaː
- Yes
- PartType
- Aspect
- PART: ta
- Case
- PART: na, ta
- Der
- PART: mài, màːsu, mafìː, maràsaː
- Evident
- PART: wai
- Foc
- PART: neː, nèː, kèːnan, cèː
- Neg
- PART: ba, bàː, bà, kadà
- Pred
- PART: àkwai, baːbù, bâː, gàː
- Top
- PART: kuma, dai, maː, kùwa, kàm, fa
- Aspect
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: neː.
- This corpus uses 6 lemmas as auxiliaries (aux). Examples: yaː, nàː, yà, zâi, kàn, yâː.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (199)
- VERB--PRON (12)
- VERB--PRON-Nom (10)
- VERB-Part--NOUN (1)
- VERB-Vnoun--NOUN (22)
- VERB-Vnoun--PRON (6)
- VERB-Vnoun--PRON-Nom (2)
- obj
- VERB--NOUN (272)
- VERB--NOUN-ADP(baz-) (1)
- VERB--NOUN-ADP(cêːwaː) (1)
- VERB--PRON (12)
- VERB--PRON-Acc (27)
- VERB--PRON-Nom (1)
- VERB-Part--NOUN (6)
- VERB-Vnoun--NOUN (70)
- VERB-Vnoun--PRON (4)
- iobj
- VERB--PRON (1)
- VERB--PRON-Acc (6)
- VERB--PRON-Dat (24)
- VERB-Vnoun--NOUN (3)
Verbs with Reflexive Core Objects
- This corpus contains 4 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: bayyan- kânsà, kai kânsù, kaːr- kânsù, taimak- kânsù
Relations Overview
- This corpus uses 11 relation subtypes: acl:relcl, advcl:cleft, cc:preconj, compound:prt, flat:foreign, flat:name, nmod:appos, nmod:poss, nsubj:outer, obl:arg, obl:mod
- The following 5 relation types are not used in this corpus at all: expl, clf, list, orphan, goeswith