home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD Yoruba YTB

Language: Yoruba (code: yo)
Family: Niger-Congo

This treebank has been part of Universal Dependencies since the UD v2.2 release.

The following people have contributed to making this treebank part of UD: Adédayọ̀ Olúòkun, Daniel Zeman, Seyi Williams, Ọlájídé Ishola.

Repository: UD_Yoruba-YTB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18

License: CC BY-SA 4.0

Genre: bible, wiki

Questions, comments? General annotation questions (either Yoruba-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [zeman (æt) ufal • mff • cuni • cz]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation	Source
Lemmas	annotated manually
UPOS	annotated manually, natively in UD style
XPOS	not available
Features	annotated manually, natively in UD style
Relations	annotated manually, natively in UD style

Description

Parts of the Yoruba Bible and of the Yoruba edition of Wikipedia, hand-annotated natively in Universal Dependencies.

…

Acknowledgments

…

References

@inproceedings{ishola-zeman-2020-yoruba,
title = "{Y}or{\`u}b{\'a} Dependency Treebank ({YTB})",
author = "Ishola, Ol{\'a}j{\'i}d{\'e} and Zeman, Daniel",
editor = "Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios",
booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2020.lrec-1.637/",
pages = "5178--5186",
language = "eng",
ISBN = "979-10-95546-34-4"
}

Statistics of UD Yoruba YTB

POS Tags

ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X

Features

Case – Number – NumType – Person – PronType – Typo

Relations

acl – advcl – advmod – amod – appos – aux – case – cc – ccomp – compound – compound:prt – compound:svc – conj – cop – csubj – det – discourse – expl – fixed – flat – goeswith – iobj – mark – nmod – nsubj – nummod – obj – obl – orphan – parataxis – punct – root – vocative – xcomp

Tokenization and Word Segmentation

This corpus contains 317 sentences, 8198 tokens and 8243 syntactic words.

This corpus contains 1156 tokens (14%) that are not followed by a space.

This corpus does not contain words with spaces.

This corpus contains 23 types of words that contain both letters and punctuation. Examples: Commons., irocks.com, Hip-, Lavinsky,, Lát', Nollywood,, OYIN,, Premier), R&B, St.Judes, T', engineer), gán-án-ní, jọ-, kárùn-ún, kìn-ín-ní, mindat.org, mẹ́sàn-án, níhìn-ín, níhín-ín, ÀṢÀ,, àgbárí-, ṣ'

This corpus contains 43 multi-word tokens. On average, one multi-word token consists of 2.05 syntactic words.
There are 23 types of multi-word tokens. Examples: orúkọ, lórúkọ, sílẹ̀, lọ́wọ́, ìṣọmọlórúkọ, lára, pànìyàn, sílé, fọwọ́, gbàágbọ́, láradá, lógo, lókúta, lóró, lẹ́nu, lẹ́rẹ̀kẹ́, mọ́lẹ̀, nìyìí, sewọn, sára, síta, ìsọmọlórúkọ, ẹsì.

Morphology

Nominal Features

Number

Plur
- AUX: maa
- DET: àwọn
- PRON: wọn, ẹ, a, wọ́n, yín, ẹ̀yin, wa, tiwọn, àwa, àwọn
- PROPN: Naijiria

Sing
- ADJ: gidi, tíkòsi´
- ADP: Ni, bíi
- AUX: ó, ti
- CCONJ: tàbi
- NOUN: Nnaji, Omi
- NUM: 2i
- PRON: ó, rẹ̀, ìwọ, mi, èmi, rẹ, un, i, mo, òun
- PROPN: Alhaji
- SCONJ: bíi

Case

Acc
- ADJ: gidi, tíkòsi´
- ADP: Ni, bíi
- CCONJ: tàbi
- NOUN: Nnaji
- NUM: 2i
- PRON: wọn, mi, ẹ, i, wọ́n, ọ́, ọ, mí, wa, bíi
- PROPN: Alhaji
- SCONJ: bíi

Gen
- PRON: rẹ̀, yín, rẹ, un, tirẹ̀, á, òun, tiwọn, ọ, ú

Nom
- AUX: ó, maa, ti
- NOUN: Omi
- PRON: ó, a, wọn, wọ́n, ìwọ, ẹ, èmi, ẹ̀yin, mo, òun
- PROPN: Naijiria

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

PronType

Dem
- DET: àwọn
- PRON: èyí

Emp
- PRON: ara

Ind
- ADV: nìkan
- CCONJ: Àti
- PRON: ẹni, ẹnikẹ́ni, àwọn, ẹnì, Ibi, pọn, àworan, ẹnikẹni, ẹnìkan

Int
- PRON: kí, Ta, èwo, èéṣe, kín

Prs
- ADJ: gidi, tíkòsi´
- ADP: Ni, bíi
- AUX: ó, maa, ti
- CCONJ: tàbi
- NOUN: Nnaji, Omi
- NUM: 2i
- PRON: ó, rẹ̀, wọn, a, ẹ, wọ́n, yín, ìwọ, mi, èmi
- PROPN: Alhaji, Naijiria
- SCONJ: bíi

Rel
- PRON: tí, ti, ohunkóhun, bi

NumType

Card
- ADP: láàárín
- NUM: kan, méjì, 3, 000, 2004, 2005, mẹ́rin, Ọ̀kan, 10, 1520

Ord
- ADJ: kẹrin, Benin, kejì, kárùn-ún, kìn-ín-ní, lóbìnrin, ún

Person

1
- ADJ: gidi, tíkòsi´
- ADP: Ni, bíi
- AUX: maa, ti
- CCONJ: tàbi
- NOUN: Nnaji, Omi
- NUM: 2i
- PRON: a, mi, èmi, mo, mí, wa, bíi, àwa, à, Nnaji
- PROPN: Alhaji, Naijiria
- SCONJ: bíi

2
- PRON: ẹ, yín, ìwọ, rẹ, ẹ̀yin, ọ, ọ́, alàtúnkọ, jọ, mọ

3
- AUX: ó
- PRON: ó, rẹ̀, wọn, wọ́n, un, i, òun, a, tirẹ̀, á

Other Features

Typo
- Yes
  - ADJ: ẹṣẹ
  - ADP: kòsí, ni, si, sí, ẹba
  - ADV: sá, tẹ̀lẹ̀, ṣá
  - AUX: má, tí, le
  - CCONJ: si, sí, ti
  - DET: ná
  - NOUN: ayọ, bẹ́, iye, lọ́wọ́, osọ́nà, ẹṣẹ̀
  - PART: ní
  - PRON: rẹ, kín, o, ti
  - PROPN: Bétanì
  - SCONJ: bi
  - VERB: ba, bà, gbẹ́

Syntax

Auxiliary Verbs and Copula

This corpus uses 2 lemmas as copulas (cop). Examples: jẹ́, ní.

This corpus uses 16 lemmas as auxiliaries (aux). Examples: ń, ti, kí, lè, yóò, máa, jẹ́, má, ó, gbọdọ̀, yió, ní, a, bá, tí, ì.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB--NOUN (206)
- VERB--NOUN-ADP(bá) (1)
- VERB--NOUN-ADP(bí) (2)
- VERB--NOUN-ADP(irú) (1)
- VERB--NOUN-ADP(láyé) (1)
- VERB--NOUN-ADP(ni) (5)
- VERB--NOUN-ADP(ní) (1)
- VERB--NOUN-ADP(nínú) (1)
- VERB--NOUN-ADP(ojú) (1)
- VERB--NOUN-ADP(sí) (1)
- VERB--NOUN-ADP(ìṣọmọlí) (1)
- VERB--NOUN-ADP(òní) (1)
- VERB--NOUN-Acc (1)
- VERB--PRON (143)
- VERB--PRON-ADP(Bí) (1)
- VERB--PRON-Acc (39)
- VERB--PRON-Gen (6)
- VERB--PRON-Gen-ADP(lẹ́yìn) (2)
- VERB--PRON-Nom (310)
- VERB--PRON-Nom-ADP(bá) (1)
- VERB--PRON-Nom-ADP(ni) (7)
- VERB--PRON-Nom-ADP(ní) (1)
- VERB--PRON-Nom-ADP(nítorí) (1)

obj
- VERB--NOUN (295)
- VERB--NOUN-ADP(bí) (1)
- VERB--NOUN-ADP(di) (1)
- VERB--NOUN-ADP(fún) (2)
- VERB--NOUN-ADP(inú) (1)
- VERB--NOUN-ADP(jáde) (1)
- VERB--NOUN-ADP(lẹ́yìn) (1)
- VERB--NOUN-ADP(ni) (4)
- VERB--NOUN-ADP(ní) (4)
- VERB--NOUN-ADP(nínú) (1)
- VERB--NOUN-ADP(pẹ̀lú) (1)
- VERB--NOUN-ADP(sí) (5)
- VERB--NOUN-ADP(sí)-ADP(abẹ́) (1)
- VERB--NOUN-ADP(sí)-ADP(àárin) (1)
- VERB--PRON (27)
- VERB--PRON-ADP(ni) (1)
- VERB--PRON-Acc (77)
- VERB--PRON-Acc-ADP(gẹ́gẹ́) (1)
- VERB--PRON-Acc-ADP(inú) (1)
- VERB--PRON-Acc-ADP(lé) (1)
- VERB--PRON-Acc-ADP(lọ́wọ́) (1)
- VERB--PRON-Gen (40)
- VERB--PRON-Gen-ADP(fún) (1)
- VERB--PRON-Gen-ADP(ni) (1)
- VERB--PRON-Nom (11)

iobj
- VERB--PRON-Gen-ADP(ní) (2)

Relations Overview

This corpus uses 2 relation subtypes: compound:prt, compound:svc
The following 5 relation types are not used in this corpus at all: dislocated, clf, list, reparandum, dep