UD Ika ChibErgIS
Language: Ika (code: arh)
Family: Chibchan
This treebank has been part of Universal Dependencies since the UD v2.16 release.
The following people have contributed to making this treebank part of UD: Jana Bajorat, Natalia Cáceres Arandia.
Repository: UD_Ika-ChibErgIS
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: spoken
Questions, comments? General annotation questions (either Ika-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [jana • bajorat (æt) hu-berlin • de]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
A Universal Dependencies corpus for Ika, a member of the Chibchan language family. The language is spoken by about 25,000 speakers in Colombia.
The treebank is an automatic conversion of complete trees of the SUD_Ika-ChibErgIS, which is an automatic conversion of the mSUD_Ika-ChibErgIS which was extracted from an interlinearized corpus in Flex format.
The original corpus consists of 40 texts recorded in the Ika language (41 recordings). The texts were collected as part of original fieldwork conducted between 2018 and 2022 in Pueblo Bello, Cesar, Colombia.
Acknowledgments
This treebank was created as part of the ChibErgIS project.
Special thanks go to Bruno Guillaume for the SUD-to-UD conversion, and to Sylvain Kahane and Aleksandra Miletic for their support. I am also deeply grateful to Natalia Cáceres Arandia for her assistance with the (S)UD annotation, and to Florian Deichsler for his help with glossing in FLEx and solving a wide range of technical issues.
Finally, I would like to express my heartfelt thanks to the Ika native speakers for allowing me to record them, and especially to Leidy Karina Izquierdo Mejía and Gunyan Giovanny Hernán Izquierdo Mejía for transcribing the Ika data and assisting with the Spanish translations.
Statistics of UD Ika ChibErgIS
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Relations
acl – advcl – advcl:purp – advmod – advmod:emph – amod – appos – aux – case – cc – ccomp – compound – compound:redup – conj – cop – csubj – dep – det – discourse – dislocated – flat – iobj – mark – nmod – nmod:poss – nsubj – nsubj:outer – nummod – obj – obl – obl:arg – obl:lmod – obl:tmod – orphan – parataxis – punct – reparandum – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 628 sentences and 5307 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 272 types of words that contain both letters and punctuation. Examples: =se', zʉ'ʉn, na', i'ngwi, uye', =a'ba', ikʉnha', ka'a, uwe', ánu'gwe, ka'gʉmʉ, umʉ'n, na'me', ʉwe', ne', nane', nuse', za'ka, a'nʉ, a'zari, a'mia, a'zʉna, unige', =a'ba, in'gwi, ma'keywa, manʉnka', nuge', uwa'me', a'chwi, cho', cho'kumʉya, ga'kʉnamʉ, kau', zanu', ʉnka'si, aná'nuga, au', ga', kiwa'na, me'zanʉndi, nu'na, za'ki, =ta', agwaku', aniku', anʉnmi'ri, kizu'na, ko', kʉ'ku
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 28 word types tagged as particles (PART): =di, =gʉma, =ki, =kʉ, =kʉchʉ, =ri, =te, =ɉina, awe', awi, ewey, ey, gu, gwawa, keywʉ, ki, kinka, kinki, neki, nekʉ, ni, pʉn, pʉne', una, zan, zʉ'ʉn, zʉn, ʉwe'
- This corpus contains 22 lemmas tagged as pronouns (PRON): a, be, bekʉ, bema, ema, eyma, eymi, ikʉndi, ikʉnha', in'gweti, ingweti, inʉ, manʉnka, manʉnka', nan, niwi, pinna, sʉmʉ, yow, yów, ʉy, ʉya
- This corpus contains 10 lemmas tagged as determiners (DET): aɉwa, bema, bin, ema, eyma, pinna, yama, yow, yów, ʉya
- Out of the above, 7 lemmas occurred sometimes as PRON and sometimes as DET: bema, ema, eyma, pinna, yow, yów, ʉya
- This corpus contains 9 lemmas tagged as auxiliaries (AUX): _, aw, kaw, nan, nik, niwingwa, nuk, zan, zoy
- Out of the above, 6 lemmas occurred sometimes as AUX and sometimes as VERB: aw, kaw, nan, nik, zan, zoy
- This corpus does not use the VerbForm feature.
Nominal Features
- Anim
- ADP: =se', =si'
- ADV: umʉ'n
- NOUN: gʉmʉsinʉ, kawʉ, a'mia, achʉna, ati, azaku, buti, cheyrwa, ikʉ, kakʉ
- NUM: in'gwi
- PRON: ikʉnha', eyma, a, eymi, yów, ʉya
- PROPN: ikʉ, Paw
- VERB: a'chunha
- Inan
- ADP: =a'ba', =zey
- ADV: gunti
- AUX: neyka, na, kawa, nanay
- NOUN: pera, ka'a, ga', ga'kʉnamʉ, goru, in, kʉnzʉwa, sʉmbrenu, za'ka, zamʉ
- NUM: i'ngwi
- PRON: inʉ, eyma, eymi
- PROPN: tenʉ, arwaku
- VERB: a'chwi, a'kʉ, ana'gusi, awga, kawa, ʉnɉú
- Abs
- ADJ: ati, bʉkana, du, ingeygwi
- ADP: =ze', =a'ba', =zey
- ADV: ʉndi, gunti
- AUX: neyka, nʉnno, kawa, na, nuga, awkweyka, inuga, na'me', niwingwa, nu'na
- NOUN: pera, bunsi, ikʉ, ɉwa, zamʉ, kʉn, za'ka, in, sʉmbrenu, ichʉ
- NUM: i'ngwi, ma'keywa
- PART: ni
- PRON: eyma, ʉya, inʉ, yow, ikʉnha', eymi, a, pinna, sʉmʉ, yów
- PROPN: ikʉ, tenʉ, gwirwa, kankwamʉ, kogwi, misakʉ, wiwa, Paw, arwaku
- VERB: kawa, kwana, kʉnʉna, a'cho'sʉye', a'kusʉya, a'pa, a'zanʉngwa, anaka, anisi, awanʉkwi
- Dat
- ADP: =se'
- AUX: nʉnkwʉra
- NOUN: gʉmʉsinʉ
- Erg
- ADP: =se', =si'
- ADV: umʉ'n
- NOUN: gʉmʉsinʉ, kawʉ, a'mia, achʉna, ati, atinkʉnʉ, cheyrwa, ikʉ, ka'a, kakʉ
- NUM: in'gwi
- PRON: ikʉnha', eyma, a, eymi, inʉ, ʉya
- PROPN: ikʉ, tenʉ
- VERB: kawa
- Nom
- ADP: =se'
- AUX: nʉn
- NOUN: amipaw
- PRON: manʉnka', niwi, eyma, manʉnka, nʉn
- Def
- ADP: =se', =a'ba', =si', =zey
- ADV: umʉ'n
- AUX: neyka, kawa, na, nanay
- NOUN: gʉmʉsinʉ, pera, goru, in, ka'a, kakʉ, kawʉ, sʉmbrenu, zamʉ, a'mia
- PRON: eyma, ikʉnha', eymi, a, yów, ʉya
- PROPN: tenʉ, ikʉ, Paw
- VERB: kawa
- Ind
- ADP: =se'
- AUX: na, neyka
- NOUN: za'ka, ánu'gwe, aniga, eywʉ, ga'kʉnamʉ, ikʉ, kasta, kunsamʉ, kʉmʉ, sey
- PRON: inʉ
- VERB: ʉnɉú
- Spec
- ADP: =se'
- ADV: gunti
- NOUN: pera, kʉnzʉwa, atinkʉnʉ, cheyrwa, ga', ka'a, kanasta, kwimʉkʉnʉ
- NUM: i'ngwi, in'gwi
- PRON: inʉ
- PROPN: arwaku
- VERB: a'chunha, a'chwi, a'kʉ, ana'gusi, awga
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus uses 6 lemmas as copulas (cop). Examples: nan, kaw, zan, nik, aw, nuk.
- This corpus uses 8 lemmas as auxiliaries (aux). Examples: nan, aw, zoy, nik, nuk, zan, kaw, niwingwa.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (4)
- VERB--NOUN-ADP(=se') (24)
- VERB--NOUN-ADP(_) (1)
- VERB--NOUN-Abs (57)
- VERB--NOUN-Erg (14)
- VERB--NOUN-Nom (1)
- VERB--PRON (4)
- VERB--PRON-ADP(=se') (16)
- VERB--PRON-Abs (21)
- VERB--PRON-Erg (15)
- VERB--PRON-Nom (7)
- obj
- VERB--NOUN (13)
- VERB--NOUN-Abs (173)
- VERB--PRON (5)
- VERB--PRON-ADP(=a'ba') (1)
- VERB--PRON-ADP(=zey) (1)
- VERB--PRON-ADP(_) (2)
- VERB--PRON-Abs (51)
- iobj
- VERB--PRON (11)
Relations Overview
- This corpus uses 8 relation subtypes: advcl:purp, advmod:emph, compound:redup, nmod:poss, nsubj:outer, obl:arg, obl:lmod, obl:tmod
- The following 6 relation types are not used in this corpus at all: vocative, expl, clf, fixed, list, goeswith