UD Kiche IU
Language: Kiche (code: quc
)
Family: Mayan
This treebank has been part of Universal Dependencies since the UD v2.8 release.
The following people have contributed to making this treebank part of UD: Francis Tyers.
Repository: UD_Kiche-IU
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples, wiki, bible, fiction, government, legal, medical
Questions, comments? General annotation questions (either Kiche-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [ftyers (æt) iu • edu]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
UD Kʼicheʼ-IU is a treebank consisting of sentences from a variety of text domains but principally dictionary example sentences and linguistic examples.
The treebank was pre-annotated for morphology using the apertium-quc
(Richardson and Tyers, 2021).
The morphological analyses were disambiguated and annotated for dependency structure by hand.
Acknowledgments
We would like to thank the following for giving permission to use their sentences.
- Telma Can Pixabaj
- Academia de Lenguas Mayas de Guatemala
References
- Richardson, I. and Tyers, F. M. (2021) “A morphological analyser for Kʼicheʼ”. Procesamiento de Lenguaje Natural. No. 66, pp. 99—109
Statistics of UD Kiche IU
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Abbr – AdvType – Animacy – Aspect – Clitic – Definite – Degree – Focus – Foreign – Gender – Mood – Movement – NounType – Number – Number[obj] – Number[psor] – Number[subj] – NumType – Person – Person[obj] – Person[psor] – Person[subj] – Polarity – Polite – PronType – Reflex – Subcat – Tense – VerbForm – Voice
Relations
acl – advcl – advmod – advmod:neg – amod – appos – aux – case – cc – ccomp – clf – compound – conj – csubj – dep – dep:agr – dep:ss – det – discourse – dislocated – fixed – flat – goeswith – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 1435 sentences, 9396 tokens and 10013 syntactic words.
- This corpus contains 1675 tokens (18%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 4 types of words that contain both letters and punctuation. Examples: COVID-19, T.b.g, T.b.r., b'i
- This corpus contains 611 multi-word tokens. On average, one multi-word token consists of 2.01 syntactic words.
- There are 292 types of multi-word tokens. Examples: che, chech, chwe, bʼik, chke, kanoq, xeatinik, chawe, chike, xkilo, ekʼo, katʼek, xinwilo, xulik, Xepetik, katpetik, kinchakunik, xawilo, xinkosik, xojʼek, xrilo, xuqʼiʼo, Xinqʼiʼo, Xojbʼinik, Xojpetik, chuloqʼik, chupam, chutijik, chuwach, kaqatijo, kinwarik, kinwaʼik, kuʼano, rqasok, xabʼano, xatulik, xeʼqilala, xinpetik, xkamik, xokik, xolqakʼamaʼ, xutijo, Chatbʼinoq, Kinkowinik, Matbʼinik, Wemna, Xinchapo, Xineʼwaʼoq, Xinnaʼo, Xinsachik.
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: SYM, X
- This corpus contains 38 word types tagged as particles (PART): a, are, areʼ, aʼ, bʼa, chik, eʼ, ik, ilo, k, kamik, kʼ, kʼo, kʼol, kʼu, kʼut, la, lo, ma, maj, maja, man, mna, na, nu, o, oq, q, qa, si, sin, tiko, u, uʼ, w, wi, wuʼ, ʼek
- This corpus contains 23 lemmas tagged as pronouns (PRON): alaq, areʼ, at, aʼre, e, eʼareʼ, in, ix, jachike, jachin, jas, la, lal, laʼ, le, oj, ri, riʼ, su, uj, waʼ, we, weriʼ
- This corpus contains 7 lemmas tagged as determiners (DET): e, jun, le, ri, taq, waʼ, we
- Out of the above, 5 lemmas occurred sometimes as PRON and sometimes as DET: e, le, ri, waʼ, we
- This corpus contains 2 lemmas tagged as auxiliaries (AUX): taj, tajin
- There are 3 (de)verbal forms:
- Fin
- VERB: kawaj, xubʼij, kubʼan, xeatin, kel, kutij, xinloqʼ, xinwil, xsachon, kimbʼe
- Inf
- VERB: ukemik, utijik, utzakik, waʼim, rilik, uloqʼik, kikunaxik, qilik, ubʼanik, ubʼixik
- Part
- VERB: sachinaq, Petinaq, Sipojinaq, atijom, bʼenaq, chʼajom, kiriqom, kʼamom, petnaq, pisom
Nominal Features
- Fem
- NOUN: nan, al, ali
- PROPN: Luʼs, Naʼ, Ixchel, Leʼn, Niʼl, Siʼl, Talin, Toʼn, Waʼn, Weʼl
- Masc
- NOUN: a, tat, tataʼ
- PROPN: Teʼk, Xwan, Luʼ, Poʼx, Wel, Jun, Kel, Max, Pal, Xep
- Anim
- DET: e
- PRON: e
- Plur
- ADJ: nimaq, nimaʼq, chomaʼq, nitzʼaʼq
- DET: taq, e
- NOUN: akʼalabʼ, achijabʼ, awajibʼ, ixoqibʼ, ajchakibʼ, ajtijabʼ, alabʼomabʼ, alitomabʼ, tijoxelabʼ, ajxojolobʼ
- PRON: e, oj, ix, Aʼre, uj, eʼareʼ
- Sing
- PRON: in, areʼ, at, are, Lal
- Def
- DET: ri, le, we, r, l
- Ind
- DET: jun, ju
Degree and Polarity
- Ints
- ADJ: sibʼalaj, nimalaj, utzalaj, loqʼalaj, jeʼlalaj
- Neg
- AUX: ta, taj, tuʼ
- PART: na, man, ma, maj, mna
- VERB: Matbʼin, Makʼam, Mawil, machap, matij
Verbal Features
- Imp
- VERB-Fin: kawaj, kubʼan, kel, kutij, kimbʼe, kinbʼij, kabʼan, kinbʼe, kinwil, kuyaʼ
- Perf
- VERB-Fin: xubʼij, xeatin, xinloqʼ, xinwil, xsachon, xutij, xbʼe, xkam, xnaʼtaj, xretaʼmaj
- Prog
- AUX: tajin
- Imp
- VERB: Chitatabʼej, Chatbʼin, Matbʼin, chatchakun, chpe, Chatan, Chatmochʼochʼ, Chawil, Chinakuy, Chinatisaj
- Irr
- AUX: ta, taj
- Fut
- ADV: na, nuʼ
- Past
- VERB-Part: sachinaq, Petinaq, Sipojinaq, atijom, bʼenaq, chʼajom, kiriqom, kʼamom, petnaq, pisom
- AgFoc
- VERB-Fin: xetoʼw, xinchʼayow, xinkunan, xintoʼw, kabʼanow, kinloqʼon, kixtzuquw, kpaqʼow, ktijow, xbʼanow
- Antip
- VERB: Xsachan, kaloqʼon, ketikon, Katzukun, Kinbʼison, Kintobʼan, Kintzukun, Kkunan, Xinkunan, Xkunan
- VERB-Fin: Xsachan, kaloqʼon, ketikon, Katzukun, Kinbʼison, Kintobʼan, Kintzukun, Kkunan, Xinkunan, Xkunan
- VERB-Inf: kʼayinem, tojonik, yuqʼunik
- Pass
- VERB-Fin: xnaʼtaj, kabʼan, kbʼan, Xkʼam, Xkʼis, kkoj, kyaʼ, xkunataj, Xqʼupitaj, kabʼantaj
- VERB-Inf: ukemik, utijik, utzakik, rilik, uloqʼik, kikunaxik, qilik, ubʼanik, ubʼixik, ukunaxik
Pronouns, Determiners, Quantifiers
- Art
- DET: ri, le, jun, we, r, waʼ, ju, l
- Dem
- PRON: riʼ, laʼ, waʼ, weriʼ, we
- Int
- ADV: jawi, jas, jampaʼ, jawijeʼ, Jampa, Jawchiʼ
- PART: la
- PRON: jas, jachin, su, Jachike
- Prs
- PRON: in, la, alaq, areʼ, at, are, e, oj, ix, Aʼre
- Rel
- PRON: le, ri
- Ord
- NUM: nabʼe, ukabʼ, uwuq, Ukʼabʼ, Uwaq, urox
- Yes
- NOUN: qibʼ, wibʼ, ribʼ, kibʼ, awibʼ, ibʼ, kʼibʼ
- 1
- PRON: in, oj, uj
- 2
- PRON: at, ix, Lal
- 3
- PRON: areʼ, are, e, Aʼre, eʼareʼ
- Form
- PRON: la, alaq, Lal, luʼ
- Plur
- NOUN: ke, kech, kukʼ, qibʼ, kiwach, konojel, kibʼ, qachak, qasok, iwach
- Sing
- NOUN: e, rukʼ, rumal, uwach, rech, we, ech, ronojel, awe, upam
Other Features
- Abbr
- Yes
- NOUN: T.b.g, T.b.r.
- Yes
- AdvType
- Dir
- ADV: bʼi, kan, loq, uloq, ulo, apan, la, ubʼi, aqʼan, b'i
- Dir
- Clitic
- Yes
- PRON: la, in, alaq, e, at, oj, areʼ, luʼ, Uj, are
- Yes
- Focus
- Yes
- PART: are, wi, wuʼ, areʼ, w
- Yes
- Foreign
- Yes
- NOUN: coronavirus, nodo
- Yes
- Movement
- Abl
- VERB-Fin: x, Xineʼwaʼ, xeʼkiqʼatuj, Kateʼbʼin, Xateʼqil, Xeʼok, Xeʼul, Xinesach, Xineʼbʼin, Xiʼnwil
- Lat
- VERB-Fin: xolkikʼam, xolqakʼam, Kojulkitzukuj, Xatalqil, Xuʼlukʼam, kojalwoʼ, kolqakʼam, xatalkiʼkot, xojalwoʼ, xolkil
- Abl
- NounType
- Clf
- NOUN: a, nan, al, tat, ali, tataʼ
- Relat
- NOUN: e, rukʼ, rumal, rech, ke, uwach, we, ech, ronojel, awe
- Clf
- Number[obj]
- Plur
- VERB-Fin: Xekiqʼil, Xojutaqchiʼij, kujutoʼ, xojukʼam, Kenchʼabʼej, Keqeqaj, Kojulkitzukuj, Xekikoj, Xenutaqchʼij, Xeqil
- Sing
- VERB: kawaj, xubʼij, kubʼan, kutij, xinloqʼ, xinwil, kinbʼij, xutij, xretaʼmaj, xril
- VERB-Fin: kawaj, xubʼij, kubʼan, kutij, xinloqʼ, xinwil, kinbʼij, xutij, xretaʼmaj, xril
- Plur
- Number[subj]
- Plur
- VERB: xeatin, kewaʼ, xkil, xojʼe, xkibʼij, Chitatabʼej, Xepet, kaqatij, kujbʼe, xkibʼan
- VERB-Fin: xeatin, kewaʼ, xkil, xojʼe, xkibʼij, Xepet, kaqatij, kujbʼe, xkibʼan, xkitij
- VERB-Inf: kikunaxik, qilik
- VERB-Part: kiriqom
- Sing
- VERB: kawaj, xubʼij, kubʼan, ukemik, kel, kutij, xinloqʼ, xinwil, xsachon, kimbʼe
- VERB-Fin: kawaj, xubʼij, kubʼan, kel, kutij, xinloqʼ, xinwil, xsachon, kimbʼe, kinbʼij
- VERB-Inf: ukemik, utijik, utzakik, rilik, uloqʼik, ubʼanik, ubʼixik, ukunaxik, utasik, utikik
- VERB-Part: atijom, bʼenaq
- Plur
- Person[obj]
- 1
- VERB: Kinuloqʼoj, Xojutaqchiʼij, kinraj, kujutoʼ, xojukʼam, Chinakuy, Kinusikʼij, Kojulkitzukuj, Xinaxibʼij, Xinukatz
- VERB-Fin: Kinuloqʼoj, Xojutaqchiʼij, kinraj, kujutoʼ, xojukʼam, Kinusikʼij, Kojulkitzukuj, Xinaxibʼij, Xinukatz, Xinutiʼ
- 2
- VERB: xatkisikʼij, xatutzukuj, Xatalqil, Xateʼqil, Xatinwil, Xatriyeʼj, Xixqachʼabʼej, Xixusikʼij, chattoʼw, katinchʼabʼej
- VERB-Fin: xatkisikʼij, xatutzukuj, Xatalqil, Xateʼqil, Xatinwil, Xatriyeʼj, Xixqachʼabʼej, Xixusikʼij, katinchʼabʼej, katintoʼ
- 3
- VERB: kawaj, xubʼij, kubʼan, kutij, xinloqʼ, xinwil, kinbʼij, xutij, xretaʼmaj, xril
- VERB-Fin: kawaj, xubʼij, kubʼan, kutij, xinloqʼ, xinwil, kinbʼij, xutij, xretaʼmaj, xril
- 1
- Person[psor]
- 1
- NOUN: we, nutat, qibʼ, wibʼ, wech, nutzʼiʼ, nunan, nuqʼabʼ, nuchak, wachalal
- 2
- NOUN: awe, awumal, awukʼ, anan, atat, awa, ajiʼ, akʼajol, akʼojol, arajil
- 3
- NOUN: e, rukʼ, rumal, uwach, rech, ke, ech, ronojel, upam, ubʼiʼ
- 1
- Person[subj]
- 1
- VERB: xinloqʼ, xinwil, kawaj, kimbʼe, kinbʼij, kinbʼe, kinwil, xinkos, xinta, xojʼe
- VERB-Fin: xinloqʼ, xinwil, kawaj, kimbʼe, kinbʼij, kinbʼe, kinwil, xinkos, xinta, xojʼe
- VERB-Inf: qilik, nukunaxik
- 2
- VERB: kawaj, katʼe, xawil, Chitatabʼej, kabʼan, kabʼij, katbʼe, katpe, katpet, xaloqʼ
- VERB-Fin: kawaj, katʼe, xawil, kabʼan, kabʼij, katbʼe, katpe, katpet, xaloqʼ, kat
- VERB-Inf: awilik
- VERB-Part: atijom
- 3
- VERB: xubʼij, kubʼan, ukemik, xeatin, kel, kutij, xsachon, xutij, xbʼe, xkam
- VERB-Fin: xubʼij, kubʼan, xeatin, kel, kutij, xsachon, xutij, xbʼe, xkam, xnaʼtaj
- VERB-Inf: ukemik, utijik, utzakik, rilik, uloqʼik, kikunaxik, ubʼanik, ubʼixik, ukunaxik, utasik
- VERB-Part: bʼenaq, kiriqom
- 1
- Subcat
- Intr
- VERB: xeatin, kel, xsachon, kimbʼe, xbʼe, xkam, kinbʼe, xok, katʼe, kewaʼ
- VERB-Fin: xeatin, kel, xsachon, kimbʼe, xbʼe, xkam, kinbʼe, xok, katʼe, kewaʼ
- VERB-Inf: waʼim, waram, muxanik, usikʼik, utzʼibʼaxik
- VERB-Part: sachinaq, Petinaq, Sipojinaq, bʼenaq, petnaq, qʼayinaq, qʼaynaq
- Tran
- VERB: kawaj, xubʼij, kubʼan, ukemik, kutij, xinloqʼ, xinwil, kinbʼij, xutij, xnaʼtaj
- VERB-Fin: kawaj, xubʼij, kubʼan, kutij, xinloqʼ, xinwil, kinbʼij, xutij, xnaʼtaj, xretaʼmaj
- VERB-Inf: ukemik, utijik, utzakik, rilik, uloqʼik, kikunaxik, qilik, ubʼanik, ubʼixik, ukunaxik
- VERB-Part: atijom, chʼajom, kiriqom, kʼamom, pisom
- Intr
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus uses 2 lemmas as auxiliaries (aux). Examples: taj, tajin.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (3)
- VERB--PRON (1)
- VERB-Fin--NOUN (429)
- VERB-Fin--PRON (36)
- VERB-Inf--NOUN (25)
- VERB-Inf--PRON (1)
- VERB-Part--NOUN (8)
- obj
- VERB--NOUN (7)
- VERB--PRON (7)
- VERB-Fin--NOUN (341)
- VERB-Fin--PRON (24)
- VERB-Inf--NOUN (22)
- VERB-Inf--PRON (3)
- VERB-Part--NOUN (2)
Verbs with Reflexive Core Objects
- This corpus contains 16 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: il wibʼ, riq qibʼ, tijoj ribʼ, toʼ qibʼ, xeʼj ribʼ, xeʼj wibʼ, atinsaj awibʼ, riq kibʼ, sok wibʼ, sokaj wibʼ, sol awibʼ, tijoj kibʼ, toqʼ kibʼ, tor wibʼ, xeʼj kibʼ, yuk kʼibʼ
- Out of those, 1 lemmas occurred more than once, but never without a reflexive dependent. Examples: tijoj
Relations Overview
- This corpus uses 3 relation subtypes: advmod:neg, dep:agr, dep:ss
- The following 6 relation types are not used in this corpus at all: iobj, expl, cop, list, orphan, reparandum