UD Yoruba YTB
Language: Yoruba (code: yo
)
Family: Niger-Congo
This treebank has been part of Universal Dependencies since the UD v2.2 release.
The following people have contributed to making this treebank part of UD: Adédayọ̀ Olúòkun, Daniel Zeman, Seyi Williams, Ọlájídé Ishola.
Repository: UD_Yoruba-YTB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: bible, wiki
Questions, comments? General annotation questions (either Yoruba-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [zeman (æt) ufal • mff • cuni • cz]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
Parts of the Yoruba Bible and of the Yoruba edition of Wikipedia, hand-annotated natively in Universal Dependencies.
…
Acknowledgments
…
References
- (citation)
Statistics of UD Yoruba YTB
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
Case – Number – NumType – Person – PronType – Typo
Relations
acl – advcl – advmod – amod – appos – aux – case – cc – ccomp – compound – compound:prt – compound:svc – conj – cop – csubj – det – discourse – expl – fixed – flat – goeswith – iobj – mark – nmod – nsubj – nummod – obj – obl – orphan – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 318 sentences, 8198 tokens and 8243 syntactic words.
- This corpus contains 1156 tokens (14%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 23 types of words that contain both letters and punctuation. Examples: Commons., irocks.com, Hip-, Lavinsky,, Lát', Nollywood,, OYIN,, Premier), R&B, St.Judes, T', engineer), gán-án-ní, jọ-, kárùn-ún, kìn-ín-ní, mindat.org, mẹ́sàn-án, níhìn-ín, níhín-ín, ÀṢÀ,, àgbárí-, ṣ'
- This corpus contains 43 multi-word tokens. On average, one multi-word token consists of 2.05 syntactic words.
- There are 23 types of multi-word tokens. Examples: orúkọ, lórúkọ, sílẹ̀, lọ́wọ́, ìṣọmọlórúkọ, lára, pànìyàn, sílé, fọwọ́, gbàágbọ́, láradá, lógo, lókúta, lóró, lẹ́nu, lẹ́rẹ̀kẹ́, mọ́lẹ̀, nìyìí, sewọn, sára, síta, ìsọmọlórúkọ, ẹsì.
Morphology
Tags
- This corpus uses 17 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus contains 15 word types tagged as particles (PART): Bí, Kíni, Olorì, hàn, kì, kìí, kò, kọ́, ni, nì, ní, Ìmúlò, ìkejì, í, Ǹjẹ́
- This corpus contains 87 lemmas tagged as pronouns (PRON): Alámùójútó, Nnaji, Ta, a, alàtúnkọ, ara, awa, bi, bíi, e, fun, i, ibi, jọ, kí, kòsí, lati, lówó, maa, mi, mo, mu, méje, mí, míi, mọ, naà, náà, o, ohunkóhun, pọn, ri, rẹ, rẹ̀, ti, tilẹ, tirẹ̀, tiwọn, tàbí, tèmi, tìpa, tìrẹ, tí, tíkòsi´, tó, u, un, wa, wo, wọn, wọ́n, yin, yín, Áfríkà, Èkó, à, àpẹẹrẹ, àwa, àworan, àwọn, á, èmi, èmí, èwo, èyí, èéṣe, é, ìtọ́nisọ́nà, ìwọ, ìwọ́, í, òun, ó, ú, ún, ṣọrẹ, Ẹyin, ẹ, ẹni, ẹnikẹni, ẹnikẹ́ni, ẹnì, ẹnìkan, ẹ̀yin, ẹ́, ọ, ọ́
- This corpus contains 21 lemmas tagged as determiners (DET): Bákanáà, Imo, Orísìírísìí, Oríṣìíríṣìí, báyìí, gbogbo, kọn, lo, náà, o, wọnni, wọ̀nyí, yìí, Àwon, à, àbo, àwón, àwọn, èyí, ìwọ̀nyí, ṣùgbọn
- Out of the above, 5 lemmas occurred sometimes as PRON and sometimes as DET: náà, o, à, àwọn, èyí
- This corpus contains 16 lemmas tagged as auxiliaries (AUX): a, bá, gbọdọ̀, jẹ́, kí, lè, má, máa, ní, ti, tí, yió, yóò, ì, ó, ń
- Out of the above, 5 lemmas occurred sometimes as AUX and sometimes as VERB: bá, jẹ́, kí, lè, ní
- This corpus does not use the VerbForm feature.
Nominal Features
- Plur
- AUX: maa
- DET: àwọn
- PRON: wọn, ẹ, a, wọ́n, yín, ẹ̀yin, wa, tiwọn, àwa, àwọn
- PROPN: Naijiria
- Sing
- ADJ: gidi, Bìrìtìkó, tíkòsi´
- ADP: Ni, bíi
- AUX: ó, ti
- CCONJ: tàbi
- NOUN: Nnaji, Omi
- NUM: 2i
- PRON: ó, rẹ̀, ìwọ, mi, èmi, rẹ, un, i, mo, òun
- PROPN: Alhaji
- SCONJ: bíi
- Acc
- ADJ: gidi, tíkòsi´
- ADP: Ni, bíi
- CCONJ: tàbi
- NOUN: Nnaji
- NUM: 2i
- PRON: wọn, mi, ẹ, i, wọ́n, ọ́, ọ, mí, wa, bíi
- PROPN: Alhaji
- SCONJ: bíi
- Gen
- PRON: rẹ̀, yín, rẹ, un, tirẹ̀, á, òun, tiwọn, ọ, ú
- Nom
- ADJ: Bìrìtìkó
- AUX: ó, maa, ti
- NOUN: Omi
- PRON: ó, a, wọn, wọ́n, ìwọ, ẹ, èmi, ẹ̀yin, mo, òun
- PROPN: Naijiria
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
- Dem
- DET: àwọn
- PRON: èyí
- Emp
- PRON: ara
- Ind
- ADV: nìkan
- CCONJ: Àti
- PRON: ẹni, ẹnikẹ́ni, àwọn, ẹnì, Ibi, pọn, àworan, ẹnikẹni, ẹnìkan
- Int
- PRON: kí, Ta, méje, èwo, èéṣe, kín
- Prs
- ADJ: gidi, Bìrìtìkó, tíkòsi´
- ADP: Ni, bíi
- AUX: ó, maa, ti
- CCONJ: tàbi
- NOUN: Nnaji, Omi
- NUM: 2i
- PRON: ó, rẹ̀, wọn, a, ẹ, wọ́n, yín, ìwọ, mi, èmi
- PROPN: Alhaji, Naijiria
- SCONJ: bíi
- Rel
- PRON: tí, ti, ohunkóhun, bi
- Card
- ADP: láàárín
- NUM: kan, méjì, 3, 000, 2004, 2005, mẹ́rin, Ọ̀kan, 10, 1520
- Ord
- ADJ: kẹrin, Benin, kejì, kárùn-ún, kìn-ín-ní, lóbìnrin, ún
- 1
- ADJ: gidi, tíkòsi´
- ADP: Ni, bíi
- AUX: maa, ti
- CCONJ: tàbi
- NOUN: Nnaji, Omi
- NUM: 2i
- PRON: a, mi, èmi, mo, mí, wa, bíi, àwa, Nnaji, à
- PROPN: Alhaji, Naijiria
- SCONJ: bíi
- 2
- PRON: ẹ, yín, ìwọ, rẹ, ẹ̀yin, ọ, ọ́, alàtúnkọ, jọ, mọ
- 3
- ADJ: Bìrìtìkó
- AUX: ó
- PRON: ó, rẹ̀, wọn, wọ́n, un, i, òun, a, tirẹ̀, á
Other Features
- Typo
- Yes
- ADJ: ẹṣẹ
- ADP: kòsí, ni, si, sí, ẹba
- ADV: sá, tẹ̀lẹ̀, ṣá
- AUX: má, tí, le
- CCONJ: si, sí, ti
- DET: ná
- NOUN: ayọ, bẹ́, iye, lọ́wọ́, osọ́nà, ẹṣẹ̀
- PART: ní
- PRON: rẹ, kín, o, ti
- PROPN: Bétanì
- SCONJ: bi
- VERB: ba, bà, gbẹ́
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: jẹ́, ní.
- This corpus uses 16 lemmas as auxiliaries (aux). Examples: ń, ti, kí, lè, yóò, jẹ́, máa, má, ó, gbọdọ̀, yió, ní, a, bá, tí, ì.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (213)
- VERB--NOUN-ADP(Láyé) (1)
- VERB--NOUN-ADP(bá) (1)
- VERB--NOUN-ADP(bí) (2)
- VERB--NOUN-ADP(irú) (1)
- VERB--NOUN-ADP(ni) (6)
- VERB--NOUN-ADP(ní) (1)
- VERB--NOUN-ADP(nínú) (1)
- VERB--NOUN-ADP(ojú) (1)
- VERB--NOUN-ADP(sí) (1)
- VERB--NOUN-ADP(Ìṣọmọlí) (1)
- VERB--NOUN-ADP(òní) (1)
- VERB--NOUN-Acc (1)
- VERB--PRON (147)
- VERB--PRON-ADP(Bí) (1)
- VERB--PRON-Acc (42)
- VERB--PRON-Gen (7)
- VERB--PRON-Gen-ADP(lẹ́yìn) (2)
- VERB--PRON-Nom (327)
- VERB--PRON-Nom-ADP(bá) (1)
- VERB--PRON-Nom-ADP(ni) (7)
- VERB--PRON-Nom-ADP(ní) (2)
- VERB--PRON-Nom-ADP(nítorí) (1)
- obj
- VERB--NOUN (295)
- VERB--NOUN-ADP(bí) (1)
- VERB--NOUN-ADP(di) (1)
- VERB--NOUN-ADP(fún) (2)
- VERB--NOUN-ADP(inú) (1)
- VERB--NOUN-ADP(jáde) (1)
- VERB--NOUN-ADP(lẹ́yìn) (1)
- VERB--NOUN-ADP(ni) (4)
- VERB--NOUN-ADP(ní) (4)
- VERB--NOUN-ADP(nínú) (1)
- VERB--NOUN-ADP(pẹ̀lú) (1)
- VERB--NOUN-ADP(sí) (5)
- VERB--NOUN-ADP(sí)-ADP(abẹ́) (1)
- VERB--NOUN-ADP(sí)-ADP(àárin) (1)
- VERB--PRON (27)
- VERB--PRON-ADP(ni) (1)
- VERB--PRON-Acc (78)
- VERB--PRON-Acc-ADP(gẹ́gẹ́) (1)
- VERB--PRON-Acc-ADP(inú) (1)
- VERB--PRON-Acc-ADP(lé) (1)
- VERB--PRON-Acc-ADP(lọ́wọ́) (1)
- VERB--PRON-Gen (40)
- VERB--PRON-Gen-ADP(fún) (1)
- VERB--PRON-Gen-ADP(ni) (1)
- VERB--PRON-Nom (11)
- iobj
- VERB--PRON-Gen-ADP(ní) (2)
Relations Overview
- This corpus uses 2 relation subtypes: compound:prt, compound:svc
- The following 5 relation types are not used in this corpus at all: dislocated, clf, list, reparandum, dep