UD Beja NSC
Language: Beja (code: bej
)
Family: Afro-Asiatic, Cushitic
This treebank has been part of Universal Dependencies since the UD v2.8 release.
The following people have contributed to making this treebank part of UD: Martine Vanhove, Rayan Ziane, Sylvain Kahane, Bruno Guillaume.
Repository: UD_Beja-NSC
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.14
License: CC BY-SA 4.0
Genre: spoken
Questions, comments? General annotation questions (either Beja-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [martine • vanhove (æt) cnrs • fr; sylvain (æt) kahane • fr]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | not available |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
A Universal Dependencies corpus for Beja, North-Cushitic branch of the Afro-Asiatic phylum mainly spoken in Sudan, Egypt and Eritrea.
The treebank is an automatic conversion of the SUD_beja-NSC, which was extracted from Martine Vanhove’s corpus in Elan format (https://corpafroas.huma-num.fr/Archives/corpus.php).
Sentences are annotated with the following metadata :
sent_id
(which indicates the source file and the segmentation identifier in the source file)text
(lexical tokenization)text_en
(english interpretation)text_tokenized
(morphological tokenization)
Acknowledgments
This treebank has been done in collaboration between Vanhove Martine, Ziane Rayan and Kahane Sylvain. Thanks to Bruno Guillaume for the conversion to UD and the help to finalization.
Statistics of UD Beja NSC
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Case – Definite – Degree – Deixis – ExtPos – Foreign – Gender – Mood – Number – PartType – Person – Polarity – Polite – Poss – PronType – Reflex – VerbClass – VerbType – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – case – cc – ccomp – compound:svc – cop – csubj – dep – dep:comp – dep:conj – dep:redup – det – discourse – dislocated – dislocated:mod – dislocated:obj – dislocated:subj – fixed – iobj – mark – nmod – nmod:poss – nsubj – nummod – obj – obl:arg – obl:mod – parataxis – parataxis:insert – parataxis:parenth – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 380 sentences and 5888 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 18 types of words that contain both letters and punctuation. Examples: aa#, aː#, dh#, firar#, hahadn#, har#, ifi#, igam#, kaː#, rif#, tʔanoː#, uʔeː#, {noise}, əddew#, əgəg#, əl#, ət#, ʔo#
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 31 word types tagged as particles (PART): =eːt, =ja, =jaː, =jeːt, =na, =ni, ajwa, akoː, areː, ba=, bak, bass, baː=, bi=, geː, han, handeː, hasara, haːjloː, ja, jaː, ka=, ki=, malia, mhasi, nuːn, ontʔa, ontʔabit, taktak, tʔa, ʃaːwi
- This corpus contains 1 lemmas tagged as pronouns (PRON): _
- This corpus contains 1 lemmas tagged as determiners (DET): _
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
- This corpus does not use the VerbForm feature.
Nominal Features
- Fem
- AUX: tirib, tiːkti
- DET: =t, ti=, t=, toː=, tuː=, oːt, toːt, tuːt, taː=, uːt
- NOUN: naː, na, ʔabaː, takat, ʔaba, ʔalba, ʤoːharaaːji, karaːma, tji, ʃabaka
- PRON: =t, ti=
- SCONJ: =eːt, =jeːt, ti=, =t
- VERB: tini, tiːfi, ʔabkin, tifirʔa, tikati, tiki, geːdti, sallamta, tifirʔi, titdʔaːr
- Masc
- AUX: iːkti, =wa, iki, indi, ini, irib, iːha, bʔijan, ihi, iniːn
- DET: i=, oː=, =b, uː=, w=, oːn, uːn, j=, eː=, eːn
- INTJ: jhaː
- NOUN: tak, mhiːn, doːr, jhaːm, mijʔat, jam, bhar, heːlaj, gaw, handi
- PRON: umbaruːk, baruːk, barjoː, baruː, i=, jhaː, wi=, =b, =oːn, baroːk
- SCONJ: =eːb, =jeːb, wi=, w=, =b, i=, ji=, wʔi=
- VERB: ini, indi, iːfi, jʔi, id, ʔeːja, isni, iːbri, idi, ihi
- Coll
- NOUN: dhaj, waːw
- Plur
- ADJ: naʃʃalama
- ADP: =eːb, =eː, =jeː, =jeːb, =eːt
- AUX: =a, =jaː, idʔana, ikatin, iːktiːn, niki
- DET: eː=, eːn, aːn, aː=, taː=, =eː, baliːnaːj, eːt, teː=
- NOUN: jam, kam, iːjʔaː, ginha, miːmaʃa, ʃartija, doːra, gabal, halaka, hamoː
- PART: malia
- PRON: =eː, =oːn, =uːn, =jeː, =hoːn, hinin, =eːk, =aː, =aːk, =jaː
- SCONJ: ji=
- VERB: eːn, iːfiina, ijajna, iːbrin, jʔeːn, askineːna, eːdn, eːdna, eːfeːn, eːfeːna
- Sing
- ADP: =iːb, =iː, =iːt
- AUX: =u, =i, iːkti, aki, =wa, andi, iki, =ju, ani, arib
- DET: oː=, uː=, w=, oːn, uːn, toː=, tuː=, beːn, oːt, toːt
- INTJ: jhaː
- PRON: =heːb, =i, =oː, ani, =hoːk, =oːk, =joː, =ji, =iji, hoː
- SCONJ: =jeːb, wi=, =eːb, w=, wʔi=
- VERB: ini, indi, jʔi, iːfi, ani, iːbri, manri, rhan, id, sallamaman
- Abl
- ADP: hoːj, =iː, hoːs, =eː, =jeː
- PRON: =iːsi, =iːsiː, =siːsi, =iːsoː, =saj
- Acc
- DET: oː=, =b, oːn, toː=, eː=, eːn, oːt, toːt, =eː, beːt
- PRON: =oː, =i, =eː, =hoːk, =oːk, =joː, =heːb, =oːn, =jeː, =hoːn
- Com
- ADP: haːj
- Dat
- PRON: hoː
- Gen
- ADP: =i, =ji, =eː, =jeː
- DET: oːnaːj, baliːnaːj
- PRON: =iji, =ihi, =ji, =iheː, =ijoː, =hi, aniː
- Loc
- ADP: =iːb, =eːb, =jeːb
- Nom
- DET: uː=, uːn, tuː=, aːn, beːn, aː=, tuːt, taː=, uːt
- PRON: ani, =i, =uːn, umbaruːk, =ji, baruːk, hinin, =uː, baruː, =aː
- Voc
- ADP: =aj
- PRON: jhaː
- Def
- DET: i=, oː=, uː=, w=, ti=, t=, j=, toː=, eː=, tuː=
- PRON: i=, ti=
- SCONJ: ti=, w=, i=
- Ind
- DET: =t, =b
- SCONJ: =t
Degree and Polarity
- Cmp
- ADP: =ka
- Dim
- DET: =t
- Equ
- ADP: =eːt
- Neg
- AUX: arib, irib, aki, tirib
- PART: ka=, ki=, baː=
- VERB: aakaj, akaːj, ibarin, tkatiːm, tʔam
Verbal Features
- Aor
- AUX: iːkti, iːha, tiːkti
- VERB: iːfi, iːbri, iːfiina, iːd, tiːfi, iːbrin, iːkti, hiːn, hi, ihikil
- Imp
- AUX: andi, indi, akati, aniːw, iniːn
- PART: ki=, ka=
- VERB: indi, manri, fanrʔi, dannʔi, iniːw, afanrʔi, andiːf, aniːw, ijajna, iniːn
- Perf
- AUX: akajeː, aki, iki, iːkti, arib, ini, irib, adi, ajha, ani
- VERB: ini, eːn, ani, id, ʔeːja, adif, isni, tifirʔa, tini, aba
- Imp
- PART: baː=
- Opt
- AUX: ba=
- PART: bi=, ba=
- VERB: aakaj, akaːj, ibarin, tkatiːm, tʔam
- Mid
- VERB: tifirʔa, ameːsa~sʔeː, asʔa, ikan, ʔagar, agam, akan, agar, akteːn, aktiːn
- Pass
- VERB: agam
Pronouns, Determiners, Quantifiers
- Dem
- DET: oːn, uːn, eːn, aːn, beːn, oːt, toːt, tuːt, oːnaːj, uːt
- PRON: beːn, oːn
- Rel
- PRON: i=, wi=, =b, =t, ji=, ti=
- SCONJ: =eːb, =jeː, =eː, =jeːb, =i, =eːt, =jeːt, wi=, =ji, w=
- Yes
- PRON: =i, =oː, =eː, =oːk, =uːn, =ji, =joː, =oːn, =iji, =jeː
- Yes
- PRON: kna, kina, nafs
- 1
- AUX: =u, =i, =a
- PRON: =heːb, =i, ani, =oː, =ji, =oːn, =uːn, =eː, =iji, hoː
- VERB: ʔagar, dannʔi, hagil, hagit, haːra~riw, manri, ʔanbiːk
- 2
- AUX: =wa
- PRON: =hoːk, =oːk, umbaruːk, baruːk, =eːk, =aːk, =joːk, =juːk, =oːkna, barijoːk
- VERB: danri, fanrʔi, ʃanbiːb
- 3
- AUX: =u, =a, =i, =ju, =jaː, =ji
- PRON: =oː, =eː, =joː, =ihi, =jeː, =iheː, =ijoː, =iːsiː, =uː, barjoː
- VERB: eːn, manri, ʔeːja, dannʔi, fanrʔi, hangiːt, sanni, kʷanri, sangi, danri
- Form
- PRON: =uːn, =hoːn, =oːn
Other Features
- Deixis
- Prox
- DET: oːn, uːn, eːn, aːn, oːt, toːt, tuːt, oːnaːj, uːt, eːt
- PRON: oːn
- Remt
- DET: beːn, beːt
- PRON: beːn
- Prox
- ExtPos
- ADV
- NOUN: doːr
- SCONJ
- SCONJ: =eːt, =jeːt, =eːb
- ADV
- Foreign
- Yes
- NOUN: bani, gahwat
- PROPN: muːna, ʔaːdam
- Yes
- PartType
- Int
- ADV: kak, han
- CCONJ: han
- PRON: naːn, naː, ʔaːw
- Int
- VerbClass
- 1
- AUX: akaː, nʔati, ani, anʔa, idʔana, ini
- VERB: eːn, ini, indi, iːfi, ʔakraː, akajeː, diːtiːt, iːbri, manri, ahiːt
- 2
- VERB: jʔi, jʔeːtiːt, hiːreːreː, rhan, afirha, gʷʔeː, jhakseːtiːt, tameː, ʔiːbaːbeː, sallamaman
- 1
- VerbType
- Cop
- AUX: =u, =a, =i, =wa, =ju, =jaː, =ji
- Light
- AUX: diːtiːt, ani, indi, ini
- VERB: ikatina
- Cop
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (178)
- VERB--NOUN-ADP(_) (1)
- VERB--PRON (7)
- VERB--PRON-Nom (25)
- obj
- VERB--NOUN (288)
- VERB--PRON (72)
- VERB--PRON-Acc (36)
- VERB--PRON-Nom (1)
- iobj
- VERB--PRON (4)
- VERB--PRON-Acc (1)
- VERB--PRON-Dat (2)
Verbs with Reflexive Core Objects
- This corpus contains 1 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: _ kna
Relations Overview
- This corpus uses 13 relation subtypes: acl:relcl, compound:svc, dep:comp, dep:conj, dep:redup, dislocated:mod, dislocated:obj, dislocated:subj, nmod:poss, obl:arg, obl:mod, parataxis:insert, parataxis:parenth
- The following 2 main types are not used alone, they are always subtyped: compound, obl
- The following 7 relation types are not used in this corpus at all: expl, clf, conj, flat, list, orphan, goeswith