home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD Naija NSC

Language: Naija (code: pcm)
Family: Creole

This treebank has been part of Universal Dependencies since the UD v2.2 release.

The following people have contributed to making this treebank part of UD: Bernard Caron, Emmett Strickland, Marine Courtin, Kim Gerdes, Bruno Guillaume, Sylvain Kahane, Chika Kennedy Ajede, Emeka Onwuegbuzia, Samson Tella.

Repository: UD_Naija-NSC
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either Naija-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [kim (æt) gerdes • fr]. Development of the treebank happens in the UD repository but not directly in the final CoNLL-U files. You may submit bug fixes as pull requests against the dev branch but you have to go to the folder called not-to-release and locate the source files there. Contact the treebank maintainers if in doubt.

Annotation	Source
Lemmas	assigned by a program, not checked manually
UPOS	annotated manually, natively in UD style
XPOS	not available
Features	annotated manually, natively in UD style
Relations	annotated manually in non-UD style, automatically converted to UD

Description

A Universal Dependencies corpus for spoken Naija (Nigerian Pidgin).

The corpus is based on dialogues and monologues and comprises 9,242 sentences and 140,729 tokens.

Sentences are annotated with the following metadata :

sent_id (which also indicates the sample file)
text
text_en (English translation)
text_ortho (A simplified version of text where macrosyntactic annotation has been replaced by standard punctuation)
speaker_id (from the NaijaSynCor Metadata)
sound_url (links to the corresponding sound file, AlignBegin and AlignEnd features give the miliseconds that allow for a positioning in the soundfile)

Acknowledgments

The treebank was created within the NaijaSynCor project, directed by Bernard Caron and funded by the ANR, the French National Research Agency.

This corpus is a pilot for the larger corpus elaborated as part of the NaijaSynCor Project (Projet-ANR-16-CE27-0007). Its main aim is to elaborate and test the annotation and procedures that are used in the ANR-project. It will be part of a larger 500kW corpus that will be projected on prosodic and information structures and analysed for sociolinguistics variation (http://naijasyncor.huma-num.fr/).

The pilot corpus was recorded in various locations in Ibadan (Nigeria) by Bukola Babalola and Opeyemi Lewis. It was transcribed, translated and tagged manually using Elan-Corpa (http://llacan.vjf.cnrs.fr/res_ELAN-CorpA_en.php) by Folakemi Ladoja, Emeka Onwuegbuzia, Biola Oyelere and Samson Tella under the supervision of Bernard Caron. It was converted to CONLL by Mourad Aouini. First annotations were done by Marine Courtin and Sandra Bellato, who developed the guidelines under the supervision of Sylvain Kahane, Bernard Caron, and Kim Gerdes.The final Universal dependencies annotations have been manually checked by Chika Kennedy Ajede, Emeka Onwuegbuzia, and Samson Tella under the supervision of Bernard Caron using the processing chain developed by Kim Gerdes and Bruno Guillaume, based on the Arborator (https://arborator.ilpga.fr) and Grew (http://grew.fr). Marine Courtin, Kim Gerdes, Bruno Guillaume, Kirian Guillier, Sylvain Kahane, Mariam Nakhlé, Yuchen Song, Emmett Strickland, Manying Zhang have helped in the correction process.

Statistics of UD Naija NSC

POS Tags

ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X

Features

Aspect – Case – Definite – Degree – ExtPos – Gender – Mood – Number – NumType – PartType – Person – Polarity – Poss – PronType – Reflex – Tense – VerbForm – VerbType – Voice

Relations

acl – acl:relcl – advcl – advcl:cleft – advmod – amod – appos – aux – case – cc – ccomp – compound – compound:prt – compound:redup – compound:svc – conj – cop – csubj – csubj:outer – dep – det – discourse – dislocated – expl:subj – fixed – flat – flat:foreign – iobj – mark – nmod – nmod:poss – nsubj – nsubj:outer – nummod – obj – obj:lvc – obl:agent – obl:arg – obl:mod – orphan – parataxis – parataxis:conj – parataxis:discourse – parataxis:dislocated – parataxis:mod – parataxis:parenth – punct – reparandum – root – vocative – xcomp

Tokenization and Word Segmentation

This corpus contains 9241 sentences and 140837 tokens.

This corpus contains 57 tokens (0%) that are not followed by a space.

This corpus does not contain words with spaces.

This corpus contains 42 types of words that contain both letters and punctuation. Examples: n't, dat's, 's, o'clock, 'm, a'ah, D-Morris, it's, p.m., a.m., Port-Harcourt, billionaire's, ex-soldier, pre-degree, 'll, 're, Africa's, Champions', Co-commander, God's, John's, Momo's, O'neill, O.A., O.D.S., S., Zimbabwe's, admin's, e-services, guy's, hm'm, ma-akara, ma-firewood, ninety-six, o'oh, p-man, people's, pro-European, self-sufficient, twenty-fourth, un-African, voter's

Morphology

Nominal Features

Gender

Fem
- PRON: she, her, herself, hersef

Masc
- PRON: imsef, im, himsef, him, his, imself, himself

Neut
- PRON: it, itself

Number

Plur
- ADJ: your, sleepless, deir
- ADV: students, den
- AUX: Don
- DET: dose, dese, those
- NOUN: people, tings, women, things, children, years, men, tins, girls, months
- PART: dem
- PRON: we, de, dem, your, us, una, our, deir, dose, oursef
- PROPN: Nigerians, Americans, Corinthians, Fridays, Mondays, Saturdays, Sundays, Wednesdays
- X: de

Sing
- ADJ: its
- AUX: is, was, 's, 'm, am, be, does
- AUX-Fin: is, was, 's, 'm, am, does
- DET: dat, dis, that, da, this
- PART: masef
- PRON: I, e, am, me, my, im, dat, she, her, dis
- PRON-Fin: I
- SCONJ: dat, sey
- VERB-Fin: is, means, was, comes, has, begins, goes, am, depends, abounds

Case

Acc
- PART: masef
- PRON: am, me, dem, us, una, her, yourself, mysef, oursef, yoursef

Gen
- NOUN: childs, guy, guy's, people's, Champions'
- PROPN: Africa's, God's, John's, Momo's, Zimbabwe's

Nom
- AUX: 'm, Don, be
- AUX-Fin: 'm
- PRON: I, you, e, we, de, im, dem, me, she, una
- PRON-Fin: I
- SCONJ: sey
- X: de

Definite

Def
- DET: di, the

Ind
- DET: a, an

Spec
- DET: one

Degree and Polarity

Degree

Cmp
- ADJ: better, more, later, younger, less, elder, higher, Lighter, beta, earlier
- ADV: more

Sup
- ADJ: best, worst, highest, biggest, baddest, hardest, richest, latest, oldest, youngest

Polarity

Neg
- AUX: no, never, not
- DET: no
- INTJ: no
- PART: no, not, n't

Verbal Features

Aspect

Cons
- AUX: con, come

Imp
- AUX: dey
- VERB: dey

Perf
- AUX: don, never, dey, done
- PRON: We

Prosp
- AUX: go

Mood

Cnd
- AUX: for

Ind
- AUX: is, are, do, was, 's, 'm, were, have, am, did
- AUX-Fin: is, are, do, was, 's, 'm, were, have, am, did
- PRON-Fin: I
- VERB: is, means, was, said, told, comes, has, are, begins, gave
- VERB-Fin: is, means, was, said, told, comes, has, are, begins, gave

Nec
- AUX: gats, gast

Opt
- AUX: make, meh, mah, moh, mey
- VERB: make

Pot
- AUX: fit

Tense

Past
- AUX: bin, be, was, were, did
- AUX-Fin: was, were, did
- VERB: born, done, was, said, told, cheating, boiled, gave, grounded, made
- VERB-Fin: was, said, told, gave, got, had, recommended, used, balanced, came
- VERB-Part: born, done, cheating, boiled, grounded, made, accepted, called, closed, seen

Pres
- AUX-Fin: is, are, do, 's, 'm, have, am, does
- AUX-Part: being
- PRON-Fin: I
- VERB-Fin: is, means, comes, has, are, begins, goes, am, depends, 're
- VERB-Part: according, following, going, making, talking, buying, moving, pedaling, depending, eating

Voice

Pass
- VERB-Part: called, exposed, frustrated, inbuilt, pounded, rescued, scattered, tempted

Pronouns, Determiners, Quantifiers

PronType

Art
- DET: di, one, a, the, an

Dem
- DET: dis, dose, dese, those, that, da, this
- PRON: dis, dose, dese, those

Int
- ADV: how, where, why, when
- DET: which
- PART: shey
- PRON: wetin, who, what

Prs
- AUX: 'm, Don, be
- AUX-Fin: 'm
- PART: masef
- PRON: I, you, e, we, am, de, me, dem, im, us
- PRON-Fin: I
- SCONJ: sey
- X: de

Rel
- PRON: which, that

NumType

Card
- NOUN: one, sixteen
- NUM: one, two, five, three, hundred, thousand, twenty, six, seven, fifty
- X: thou~

Ord
- ADJ: first, second, third, fourth, eleventh, tenth, eighteenth, fifth, twenty-fourth

Poss

Yes
- ADJ: your, deir, its
- PRON: my, your, our, deir, her, im, una, we, dem, e

Reflex

Yes
- PART: masef
- PRON: yourself, mysef, oursef, yoursef, myself, imsef, demsef, himsef, mahnsef, ourselves

Person

1
- AUX: was, 'm, am, Don
- AUX-Fin: was, 'm, am
- PART: masef
- PRON: I, we, me, my, us, our, a, mysef, oursef, myself
- PRON-Fin: I
- VERB-Fin: was, am

2
- ADJ: your
- PRON: you, your, una, yourself, yoursef, yousef, youself

3
- ADJ: deir, its
- AUX: is, 's, be, does
- AUX-Fin: is, 's, does
- PART: dem
- PRON: e, am, de, dem, im, she, deir, her, it, imsef
- SCONJ: sey
- VERB-Fin: is, means, comes, has, begins, goes, depends, abounds, becomes, owes
- X: de

Other Features

ExtPos
- ADJ
  - ADJ: empty
  - AUX: no
  - VERB: clean
- ADP
  - ADJ: close, more, due
  - ADP: on, for, as, up, out, inside, at, based, of
  - ADV: apart, instead
  - CCONJ: plus
  - NOUN: sake, courtesy
  - SCONJ: because, cause
  - VERB: base, according, may, based, had
  - VERB-Fin: had
  - VERB-Part: according, based
- ADV
  - ADJ: first, later
  - ADP: at, in, of, as
  - ADV: so, how
  - CCONJ: and
  - NOUN: step, upside
  - VERB: tay
  - X: per
- CCONJ
  - CCONJ: and, but
- INTJ
  - AUX: na
- NOUN
  - ADP: by
- PART
  - AUX: na
- PROPN
  - ADJ: Federal, New, National, Cool, South, Middle, Nigerian, African, Big, Central
  - ADP: On
  - ADV: All
  - INTJ: OK
  - NOUN: Port, Bronze, Minister, Radio, chief, Committee, General, House, Ministry, Senate
  - PRON: We
  - PROPN: Delta, Wazobia, Lagos, Nigeria, Edo, Bayelsa, Manchester, Osun, Etsako, Imo
  - X: Boko, Copa, Cup, La
- SCONJ
  - ADP: like, as, in, of, unto, for, from, instead, on, upon
  - ADV: so, instead, apart, as, far
  - AUX: na, be
  - NOUN: sake
  - SCONJ: wey, so, because, if, dough, sey
  - VERB: base, following
  - VERB-Part: following
- VERB
  - ADJ: good, plenty, sweet, ready, cost, fine, thick, sick, big, hard
  - NOUN: dust
  - VERB: kick

PartType
- Cop
  - AUX: na, be, it's, dat's
  - INTJ: wa
  - PART: naim
  - SCONJ: sey
  - VERB: be, dat's, it's
- Disc
  - ADV: kuma
  - PART: o, sef, sha, ma, ba, self, kwa
  - PROPN: Ma

VerbType
- Cop
  - AUX: is, are, am, dey, was, it's, na, 'm, were
  - AUX-Fin: is, are, am, was, 'm, were
  - VERB: dey, was, am, becoming, is
  - VERB-Fin: was, am, is
  - VERB-Part: becoming

Syntax

Auxiliary Verbs and Copula

This corpus uses 3 lemmas as copulas (cop). Examples: na, be, dey.

This corpus uses 24 lemmas as auxiliaries (aux). Examples: dey, go, no, con, don, make, fit, bin, will, never, be, must, for, do, can, gats, should, have, would, may, shall, might, cannot, could.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB--NOUN (1774)
- VERB--NOUN-ADP(as) (1)
- VERB--NOUN-ADP(on) (2)
- VERB--PRON (348)
- VERB--PRON-Acc (52)
- VERB--PRON-Nom (9312)
- VERB--PRON-Nom-ADP(in) (1)
- VERB-Fin--NOUN (38)
- VERB-Fin--PRON (17)
- VERB-Fin--PRON-Nom (37)
- VERB-Part--NOUN (22)
- VERB-Part--PRON (5)
- VERB-Part--PRON-Nom (48)

obj
- VERB--NOUN (3916)
- VERB--NOUN-ADP(for) (1)
- VERB--NOUN-ADP(if) (1)
- VERB--NOUN-ADP(more) (3)
- VERB--NOUN-ADP(sey) (11)
- VERB--NOUN-Gen (1)
- VERB--PRON (376)
- VERB--PRON-ADP(sey) (13)
- VERB--PRON-Acc (1865)
- VERB--PRON-Nom (325)
- VERB--PRON-Nom-ADP(make) (2)
- VERB--PRON-Nom-ADP(sey) (2)
- VERB-Fin--NOUN (13)
- VERB-Fin--PRON (2)
- VERB-Fin--PRON-ADP(sey) (1)
- VERB-Fin--PRON-Acc (3)
- VERB-Part--NOUN (35)
- VERB-Part--PRON (1)
- VERB-Part--PRON-ADP(sey) (1)
- VERB-Part--PRON-Acc (4)
- VERB-Part--PRON-Nom (7)

iobj
- VERB--NOUN (2)
- VERB--PRON (4)
- VERB--PRON-Acc (398)
- VERB--PRON-Nom (101)
- VERB-Fin--PRON-Acc (2)
- VERB-Fin--PRON-Nom (1)
- VERB-Part--PRON-Acc (1)

Verbs with Reflexive Core Objects

This corpus contains 75 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: ask yoursef, see imsef, compose deirsef, build oursef, carry myself, enjoy yourself, feel herself, get mysef, head oursef, package yourself, protect yourself, tell mysef, advertise demselves, advertise yourself, arrange hersef, arrange mysef, arrange myself, bring yourself, call demsefs, call demself, carry imself, carry yoursef, carry yourself, check yoursef, cloth myself, deprive yoursef, develop demsef, do oursefs, do ourselves, engage imsef, enjoy imsef, enjoy mysef, feed myself, find yourself, finish herself, fool yourself, gather ourself, give myself, hate demsefs, help mysef, help oursef, help yourself, humble himsef, improve myself, kack mysef, kill yoursef, know oursefs, laugh mysef, look mysef, make yoursef

Relations Overview

This corpus uses 19 relation subtypes: acl:relcl, advcl:cleft, compound:prt, compound:redup, compound:svc, csubj:outer, expl:subj, flat:foreign, nmod:poss, nsubj:outer, obj:lvc, obl:agent, obl:arg, obl:mod, parataxis:conj, parataxis:discourse, parataxis:dislocated, parataxis:mod, parataxis:parenth
The following 2 main types are not used alone, they are always subtyped: expl, obl
The following 3 relation types are not used in this corpus at all: clf, list, goeswith