UD for Yorùbá
Tokenization and Word Segmentation
- Following most tokenization patterns, words are delimited by whitespace. Generally, words are delimited by whitespace or punctuation except:
- Hyphenated words that cannot be correctly annotated when split (níhìn-ín) are not split.
- Multiword tokens are not used in Yorùbá; “what you see is what you tokenize”. Contractions are not undone for words that can be correctly assigned a category.
Morphology
Tags
- Yorùbá uses 15 universal tags (SYM and INTJ do not occur in the corpus at present).
- The only word tagged as PART is the negation marker kò.
- Auxiliary verbs (AUX) are grouped to:
- jẹ́ (copular “be”)
- ní (copular “be”)
- kí follows jẹ́ in constructions that can be understood as third-person existential imperatives (“let there be light”); it is semantically and syntactically redundant, but currently we tag it as auxiliary, together with jẹ́
- kìí – negative habitual (“usually be/do not”)
- ń – imperfective or progressive aspect
- ti – perfective aspect (“have”)
- yóò, máa, á, a, ó, yió – future tense (“shall, will”)
- ìbá, ì bá – conditional (“would”); written as one word in old texts, modern spelling is ì bá
- lè – modal “can, may”
- gbọdọ̀ – modal “must”
- má, máà – negative “do not” (cf. the negative particle kò)
- The tag DET is used for articles and pronominal words used with a determiner function;
they can precede or follow the noun they quantify.
The tag PRON is used for subjects or objects of a noun phrase and to show possession.
- A word can belong to both categories
DET
andPRON
: (àwọn) is taggedDET
when used as a pluralizer of a nominal. - Gender of a pronoun is only determined by the antecedent (ó – can be he, she or it). It is not explicitly stated.
- A word can belong to both categories
- Polysemy is an important phenomenon in Yorùbá, to correctly categorize a word, the context where it occurs is the determining factor. Tone can distinguish meaning but a word with the same tone can mean different things in different circumstance (e.g. bí - “procreate”, “if”, “not” )
Features
- The pluralizing determiner àwọn is tagged
Number=Plur|PronType=Dem
. - The numeral tag NUM is used for quantity. Number words follow words they quantify.
- Adjectives agree with nouns (in attributive position) and they only have the positive Degree.
- There is no morphological Case; instead, adpositions (ADP) are used as case markers and specify the role of a noun in a phrase.
- Verb forms are past/perfect (ti), continuous (preverbial particle - ń), future (yóò). Verb serialization is prominent in Yorùbá, the first verb marks tense while the second indicates the direction of an action.
Syntax
- Yorùbá is an SVO language with a strict word order.
Subjects
- Nominal subjects are in initial position followed by adjectives, demonstrative and relative clauses.
- Nominal subject (nsubj) is a noun phrase without preposition.
- A finite subordinate clause may serve as the subject and is labeled csubj.
Objects
- Objects follow the main verb.
-
When a verb has two objects, the second one is preceded by a preposition and therefore labeled as oblique (
obl
). - Verb phrases and prepositional phrases are also head initial.
Yoruba uses 2 relation subtypes:
compound:prt
to attach verbal particles to verbscompound:svc
to connect verbs in a serial verb construction
Treebanks
There is only one Yorùbá UD treebank at present:
More
This section will probably be moved to a separate page. Examples are taken from the Language Gulper.
The default interpretation of the bare verb stem is the past tense.
Olú ra aga \n Olú buy chair
nsubj(ra, Olú)
obj(ra, aga)
“Olu bought a chair.”
The imperfective auxiliary ń is used to refer to an action in progress in the past or present, or to a habitual action.
Wọ́n ń jó \n They IMPF play
nsubj(jó, Wọ́n)
aux(jó, ń)
“They are (were) playing.”
The perfective auxiliary ti denotes a completed action.
Ó ti lọ \n He/she PERF go
nsubj(lọ, Ó)
aux(lọ, ti)
“He/she has gone.”
The auxiliaries á/ó/yió denote the future tense.
ọ̀rẹ́ mi á lọ \n friend my FUT go
nmod(ọ̀rẹ́, mi)
nsubj(lọ, ọ̀rẹ́)
aux(lọ, á)
“My friend will go.”
A combination of the imperfective/progressive and perfective auxiliaries indicates the beginning of an action in the past (progressive perfect).
Mo ti ń gba lẹ́tà rẹ \n I PERF IMPF receive letter your
nsubj(gba, Mo)
aux(gba, ti)
aux(gba, ń)
obj(gba, lẹ́tà)
nmod(lẹ́tà, rẹ)
“I have started to receive your letters.”
If the verb has two objects, the second one is preceded by the preposition ní. Therefore the second object is treated as an oblique argument in UD.
Ó kọ wa ní Yorùbá \n He teach us to Yoruba
nsubj(kọ, Ó)
obj(kọ, wa)
obl(kọ, Yorùbá)
case(Yorùbá, ní)
“He taught us Yoruba.”
There are serial verb constructions, in which several verbs appear in a sequence
without any intervening coordinator or subordinator. They share tense-aspect markers
if any, and they may share arguments, although an argument may have different roles
with respect to different verbs in the chain.
Some of these constructions could be annotated as either compound:svc
or xcomp
.
Precise criteria have yet to be formulated.
Ó gbé e wá \n He/she carry it come
nsubj(gbé, Ó)
obj(gbé, e)
compound:svc(gbé, wá)
“He/she brought it.”
Ó tì mí ṣubú \n He/she push me fall
nsubj(tì, Ó)
obj(tì, mí)
compound:svc(tì, ṣubú)
“He/she pushed me and I fell.”
Two transitive verbs combined may have each their own object.
Ó pọn omi kún kete \n He/she draw water fill pot
nsubj(pọn, Ó)
obj(pọn, omi)
compound:svc(pọn, kún)
obj(kún, kete)
“He/she drew water and filled the pot.”
But there can also be one shared object:
Ade ń ra ẹran jẹ \n Ade IMPF buy meat eat
nsubj(ra, Ade)
aux(ra, ń)
obj(ra, ẹran)
compound:svc(ra, jẹ)
“Ade is buying meat and eating it.”
In focus constructions, a constituent is placed at the front and marked by the morpheme ni. Normal sentence without focus:
Olú ra ìwé \n Olú buy book
nsubj(ra, Olú)
obj(ra, ìwé)
“Olú bought a book.”
Object focus:
Ìwé ni Olú rà \n Book FOC Olú buy
nsubj(rà, Olú)
obj(rà, Ìwé)
case(Ìwé, ni)
“It was a book that Olú bought.”
If the subject is focused, there must be a pronoun at the subject position. We treat this as an instance of clitic doubling: the fronted noun phrase is analyzed as the subject, and the pronoun is attached as an expletive:
Olú ni ó ra ìwé \n Olú FOC he buy book
nsubj(ra, Olú)
case(Olú, ni)
expl(ra, ó)
obj(ra, ìwé)
“It was Olú who bought the book.”
Oblique dependent focus:
Ní ilé ni ó ti bẹ̀rẹ̀ \n At house FOC it PERF start
case(ilé, Ní)
case(ilé, ni)
obl(bẹ̀rẹ̀, ilé)
nsubj(bẹ̀rẹ̀, ó)
aux(bẹ̀rẹ̀, ti)
“It was in the house that it started.”
The verb can be fronted in its nominalized form. It must be then repeated as a verb.
Rírà ni bàbá ra bàtà \n Buying FOC father buy shoes
case(Rírà, ni)
dislocated(ra, Rírà)
nsubj(ra, bàbá)
obj(ra, bàtà)
“Father bought shoes.”