UD Western Sierra Puebla Nahuatl MesoTree
Language: Western Sierra Puebla Nahuatl (code: nhi)
Family: Uto-Aztecan
This treebank has been part of Universal Dependencies since the UD v2.11 release.
The following people have contributed to making this treebank part of UD: Robert Pugh, Marivel Huerta Mendez, Mitsuya Sasaki, Francis Tyers, María Ximena Juarez Huerta, Ángeles Márquez Hernández.
Repository: UD_Western_Sierra_Puebla_Nahuatl-MesoTree
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-SA 4.0
Genre: spoken, fiction, grammar-examples, nonfiction
Questions, comments? General annotation questions (either Western Sierra Puebla Nahuatl-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [pughrob (æt) iu • edu]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
UD Western Sierra Puebla Nahuatl-MesoTree is a combination of the existing UD Western Sierra Puebla Nahuatl-IU treebank (ITML) (with some updates to annotations due to caught errors or changes annotation decisions) and new sentences annotated as part of the NSF-funded project, “Syntactically-annotated corpora for endangered languages in areal contact” (MesoTree).
The ITML treebank was pre-annotated for morphology using the apertium-nhi (Pugh et al, 2021). The morphological analyses were disambiguated and annotated for dependency structure by hand. The MesoTree data does not include morphological analyses at this time.
The treebank consists of sentences from written fiction and non-fiction, spontanenous speech, and grammar examples. The new additions also consist of a large chunk of sentences (ALIMG) translated into two subvarieties of the language, one from San Miguel Tenango, Zacatlán, and another from Omitlán, Tepetzintla.
Acknowledgments
We would like to thank the following for giving permission to use their sentences.
- Elizabeth Márquez Hernández
- Jaime Hernández Juárez
- Ubaldo Márquez Pérez
- Petra Schroeder
References
- Pugh, R., Tyers, F., and Huerta Mendez, M. (2021). Towards an open source finite-state morphological analyzer for Zacatlán-Ahuacatlán-Tepetzintla Nahuatl. In Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers), pages 80–85.
Statistics of UD Western Sierra Puebla Nahuatl MesoTree
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Case – Degree – ExtPos – Foreign – Gender – Mood – Movement – NounType – Number – Number[obj] – Number[psor] – Number[subj] – Person – Person[obj] – Person[psor] – Person[subj] – Polarity – Polite – PronType – Reflex – Subcat – Tense – Typo – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advmod – advmod:neg – amod – appos – aux – case – cc – ccomp – compound – conj – cop – csubj – dep – det – discourse – dislocated – fixed – flat – goeswith – iobj – mark – nmod – nsubj – nummod – obj – obl – orphan – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 3024 sentences, 19191 tokens and 19535 syntactic words.
- This corpus contains 4988 tokens (26%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 2 types of words that contain both letters and punctuation. Examples: inchancahuan:, tetame:
- This corpus contains 336 multi-word tokens. On average, one multi-word token consists of 2.02 syntactic words.
- There are 246 types of multi-word tokens. Examples: den, ican, yen, okatka, nisihtzin, nakin, nicniu, ocatca, yopeu, amotlen, mai, mokiseguiro, nima, santipitzin, yomic, yotiehkokeh, Yomononotskeh, del, intlaxcal, manioh, matiakan, matikchiwakah, moito, momoskalti, naquin, natl, nikankah, nohtli, nokse, ococh, onicatca, saoyah, yocatca, yocholoh, yomotlaleh, yotiquitiya, yowalah, Amocaten, Ikanon, Incaxmeh, Inyajtiwitzis, Ixcatqui, Mattemotin, Moyolcatl, Natentli, Nepaca, Nichan, Ninchan, Ninmitlauan, Nocse.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 1 word types tagged as particles (PART): amo
- This corpus contains 119 lemmas tagged as pronouns (PRON): akaj, akih, akij, akin, akinoh, algo, ana, aqui, aquih, aquihque, aquin, atl, catli, catliye, ce, ciqui, ese, eso, eua, inin, itl, itlah, les, lo, miac, miak, mik, miqui, mowisiotzin, nada, nakin, namehuan, namejwan, ne, neca, necah, necateh, neci, nehhuatl, nehuatl, nehuatluatl, nehwatl, nej, nejuatl, nejwatl, nicanca, nicancah, nikanka, nin, ninih, nochi, nomehwah, non, nonoh, notewah, noyohca, ocsiqui, ocsiquin, okse, oksikin, que, quen, quesqui, quexquich, se, semeh, sihtli, siki, sikin, siqui, tatita, te, teh, tehhuan, tehhuatl, tehuan, tehuat, tehuatl, tehuatzin, tehwa, tehwah, tehwan, tehwatl, tej, tejuatl, tejwan, temiston, tercero, tlan, tleh, tlen, tlenic, tleno, tlenoh, tlenohoh, tlenoj, tlenon, tlensa, tlensaso, todo, touatzin, tzocotzi, uan, ye, yeh, yehhuan, yehhuatl, yehua, yehuan, yehuatl, yehwa, yehwah, yehwatl, yehwuatl, yej, yejuatl, yejwan, yejwatl, yo
- This corpus contains 66 lemmas tagged as determiners (DET): cada, catli, catliye, catqui, ce, ciqui, comitl, cualquier, de, det, dion, ecan, el, icanca, in, incoyotl, inin, itscuintli, la, las, miac, miak, miaqui, mic, mik, miqui, mismo, n, nakin, neca, necah, nicanca, nicancah, nicxi, nikanka, nikankah, nin, nion, nitil, nochi, nochin, nochtin, non, nonoh, oc, occe, occiqui, ocse, ocsiqui, ok, okse, oksiki, quesqui, se, siki, sikin, siqui, siquin, temachtani, tich, tlaxcal, tlen, tleno, tlenoh, un, uno
- Out of the above, 29 lemmas occurred sometimes as PRON and sometimes as DET: catli, catliye, ce, ciqui, inin, miac, miak, mik, miqui, nakin, neca, necah, nicanca, nicancah, nikanka, nin, nochi, non, nonoh, ocsiqui, okse, quesqui, se, siki, sikin, siqui, tlen, tleno, tlenoh
- This corpus contains 24 lemmas tagged as auxiliaries (AUX): _, catqui, estar, haber, huili, i, katki, kisa, ma, mach, mo, nimi, o, oc, pehua, peua, pewi, ser, uili, wili, witsi, witzi, yen, youi
- Out of the above, 14 lemmas occurred sometimes as AUX and sometimes as VERB: _, catqui, huili, i, katki, kisa, nimi, peua, pewi, ser, uili, wili, witsi, youi
- There are 2 (de)verbal forms:
- Fin
- VERB: katki, oquihtoh, onauat, oyah, mota, niquihtoz, yuwi, nesi, niquihlnamiqui, niyaz
- Inf
- VERB: ver, dar
Nominal Features
- Fem
- ADJ: primera, primer
- NOUN: escuela, rana, danzas, irana, guerra, fiesta, máquina, conchas, días, historia
- Masc
- ADJ: Nuevo, mexicano, patronal, reconocido, avanzado, chistoso, mismo, patronales, pavimentado
- NOUN: pueblo, topueblo, burro, años, frasco, ejemplo, mardomos, pollito, Rancho, amigos
- PRON: lo
- PROPN: estados, unidos
- Plur
- ADJ: diferentes, huehhueyen, tlaltitikten, tzahtzayactique, wehwinyi, wihwinyeh, xitlatztiqueh, amables, chichiltiqueh, malos
- NOUN: tokniwah, ceraokwilimeh, coyomeh, mopiluan, niconeuan, tipemeh, ichcame, siwameh, danzas, años
- NUM: nahuen, naweh, yeyen, millones
- PRON: tehwah, yejwan, tehuan, yehuan, yehwah, tehhuan, ninqueh, notewah, tehwan, tejwan
- PROPN: estados, unidos
- Sing
- ADJ: cualli, kwale, weyi, kwaltsih, kwali, igual, chikawak, kwalli, tixajkaloj, uelic
- NOUN: ich, itich, ica, ika, tonal, atl, itoka, iwah, miston, itzcuintli
- PRON: neh, yeh, teh, yej, touatzin, ye, nehwatl, nej, tej, yehwatl
- PROPN: Ticpintzin
- Abs
- NOUN: atl, itzcuintli, telpukatl, ilwitl, itskwintli, tlakatl, masatl, altipetl, kowatl, tikitl
- Acc
- PRON: lo
Degree and Polarity
- Dim
- ADJ: chihchikichih, kwaltsih, kualtzin, tzocotzin, hueyihtzin, nitzotzocotzi, titzocotzitzin, tlacualtzincan, tzocotzitzin
- ADV: tzocotzin, kwaltsin, tipitzin, tzokotzitzin
- NOUN: isihtzin, tenantzin, tipitzin, Nosintitzin, guitarritas, moawitzin, namokniwantzitzin, nisihtzin, nisijtzin, nowewetzin
- PRON: namejwantzitzin, mowisiotzin
- PROPN: Ticpintzin
- Neg
- ADV: amo, akmo
Verbal Features
- Imp
- AUX: katka, catca, nicatca, ocatca, Okatka, catcah, ticatca, ticatcah, uilia, uiliah
- VERB: okatka, onipiyaya, ocatca, oniniquiya, ocmatia, ocpiyaya, onechilhuaya, oniquilhuaya, otiquitiya, oyaya
- VERB-Fin: okatka, onipiyaya, ocatca, oniniquiya, ocmatia, ocpiyaya, onechilhuaya, oniquilhuaya, otiquitiya, oyaya
- Perf
- VERB: oquihtoh, onauat, oyah, opew, opeu, oyahkeh, ocholoh, octlahtlanih, owits, oyaj
- VERB-Fin: oquihtoh, onauat, oyah, opew, opeu, oyahkeh, ocholoh, octlahtlanih, owits, oyaj
- Prog
- VERB: tikitok, kichihchiwtok, kipixtok, nitiquitoc, tentok, tsikwintok, cholohtokeh, cualantoc, molevantarohtok, niyolpactoc
- VERB-Fin: tikitok, kichihchiwtok, kipixtok, nitiquitoc, tentok, tsikwintok, cholohtokeh, cualantoc, molevantarohtok, niyolpactoc
- Cnd
- VERB-Fin: ontlasojtlaskia, oxnechoncaquinih, Onyani, occhiuilsquiah, okaltlachixtoskia, okyektlalani, onmitzwalikiliskia, onmokowani, onmokowiskia, onyaskia
- Imp
- VERB: xiyo, ixquita, Ixcochi, ixmeua, ixtlaocoya, nikixmati, xiyahkah, xoncualani, Ixcaqui, Ixnechmaka
- VERB-Fin: xiyo, ixquita, Ixcochi, ixmeua, ixtlaocoya, nikixmati, xiyahkah, xoncualani, Ixcaqui, Ixnechmaka
- Ind
- AUX: huili, o, opeu
- VERB: katki, oquihtoh, onauat, oyah, mota, niquihtoz, yuwi, nesi, niquihlnamiqui, niyaz
- VERB-Fin: katki, oquihtoh, onauat, oyah, mota, niquihtoz, yuwi, nesi, niquihlnamiqui, niyaz
- VERB-Inf: ver, dar
- Opt
- AUX: ito
- VERB: kiseguiro, Chaueh, Ixcana, cequitta, moskalti, motlamochiwa, tiakan, Chuhue, ceicxipalti, cequimpiya
- VERB-Fin: kiseguiro, Chaueh, cequitta, moskalti, motlamochiwa, tiakan, Chuhue, ceicxipalti, cequimpiya, cpiyacan
- Prp
- VERB: oquinpaleuito
- Sub
- VERB-Fin: sea
- Fut
- AUX: wilis, niiski, pewis, uilis
- VERB: niquihtoz, niyaz, niyas, tiyas, itmatis, tlamis, icchiuas, itkwalikas, Tikwikas, atliz
- VERB-Fin: niquihtoz, niyaz, tiyas, itmatis, niyas, tlamis, icchiuas, itkwalikas, Tikwikas, atliz
- Past
- AUX: katka, peuh, catca, nicatca, opeh, Okatka, catcah, ticatca, ticatcah, uilia
- VERB: oquihtoh, onauat, oyah, okatka, opew, onipiyaya, opeu, oyahkeh, ocatca, ocholoh
- VERB-Fin: oquihtoh, onauat, oyah, okatka, opew, onipiyaya, opeu, oyahkeh, ocatca, ocholoh
- Pqp
- VERB-Fin: ocholohca, yomikka
- Pres
- AUX: pewi, wili, huili, catqui, katej, nica, peweh, o
- VERB: katki, yuwi, kah, yuweh, kateh, moweyilihtih, nesi, tikitok, ehko, kinchiwaj
- VERB-Fin: katki, yuwi, kah, yuweh, kateh, moweyilihtih, nesi, tikitok, ehko, kinchiwaj
- Act
- VERB-Fin: oyah, tehco
Pronouns, Determiners, Quantifiers
- Prs
- PRON: neh, yeh, teh, yej, tehwah, yejwan, ye, non, tlen, tlenoh
- Yes
- VERB-Fin: moniki, omonacasmahman, mocelebraroa, mochiwa, mokawa, omochih, inmokawas, ixmeua, moapareserowa, molevantarohtok
- 1
- PRON: neh, tehwah, nej, tehuan, nehwatl, tehhuan, notewah, tehwan, tejwan
- 2
- PRON: teh, touatzin, tej
- 3
- PRON: yeh, yej, yejwan, ye, yehuan, yehwah, yehwatl, lo, yehhuan
- Form
- NOUN: tonnomaman
- PRON: touatzin, Tojuatzin
- VERB: oxnechoncaquinih, xoncualani, Inmitzontlasoj, Inmitzontlatlawtia, Innamechonnonotzas, Itkomonikiltijtzinowa, Ixnechonmaka, inmitzontlalilis, itconchiuas, itkonikis
- VERB-Fin: oxnechoncaquinih, xoncualani, Inmitzontlasoj, Inmitzontlatlawtia, Innamechonnonotzas, Itkomonikiltijtzinowa, Ixnechonmaka, itconchiuas, itkonikis, itkonkowas
- Plur
- NOUN: tlaxcal, tocniuan
- Sing
- NOUN: ica, itich, iixco, ich, imaman, iuan, nocax, noconeu, temachtani
Other Features
- ExtPos
- ADP
- CCONJ: wan
- ADV
- ADP: a, de, en
- ADV: ok, amo, sa, san, za, Kamach, zan, ahora, más
- DET: n
- CCONJ
- ADJ: igual
- ADP: de
- INTJ
- CCONJ: o
- INTJ: Ayy, Ja, aay
- PRON
- ADV: ok
- PRON: lo
- SCONJ
- ADP: para
- ADP
- Foreign
- Yes
- ADJ: Nuevo, atrasado, cerca, cerquita, civil, mexicano, patronal, primera, reconocido, tranquilo
- ADP: de, para, por, hasta, a, en, desde, como, sin
- ADV: después, entonces, pues, ahorita, siempre, igual, ahora, bueno, más, casi
- AUX: es
- CCONJ: pero, y, o
- DET: cada, l, las, cualquier, un
- INTJ: bueno, sí, A
- NOUN: pueblo, topueblo, escuela, rana, danzas, irana, burro, guerra, vez, años
- NUM: ocho, quince, dieciocho, nueve, siete, veinte, millones
- PRON: eso, que, nada, tercero, todo
- PROPN: Juan, estados, unidos, español, Juana, dios
- SCONJ: porque, que, como, cuando, para, hasta, Mejor
- VERB-Fin: sé, Anda, ponen, sale, sea, sirves
- VERB-Inf: ver, dar
- Yes
- Movement
- And
- VERB: nitiquititi
- Ven
- VERB: onechtlahpaloco, otualah
- And
- NounType
- Relat
- NOUN: ica, itich, ich, iuan
- Relat
- Number[obj]
- Plur
- VERB: onquimitac
- Sing
- VERB: Xictiqui, cniqui, nechnamiqui, nicmati, nicniqui, nictlamia, nitlatooctoc, onechtlahpaloco, onicpahpactoya, oniquinamacac
- VERB-Fin: cniqui
- Plur
- Number[subj]
- Plur
- ADJ: chichiltiqueh
- AUX: ocatca
- NOUN: caxmeh, telpocameh, tocniuan
- VERB: Onicsoquiyoteh, cocoxqueh, mouiqueh, oquinpaleuito, otechpanouihque, oticaxitihqueh, otimocauqueh, xitechon
- Sing
- ADJ: istac, tliltic
- AUX: catqui, huili, o, opeu
- NOUN: ica, itich, tlacatl, altipetl, atl, cali, corral, pouitl, calihtic, comal
- VERB: nitiquititi, niyas, omic, ticpatla, tolohtoc, Ixnechnextili, Oquis, Xictiqui, Xinechmaca, chocholoca
- VERB-Fin: cniqui
- Plur
- Person[obj]
- 1
- VERB: onechtlahpaloco, otinechtlaocole
- 3
- VERB: Xictiqui, cniqui, nechnamiqui, nicmati, nicniqui, nictlamia, niliutoc, nitlatooctoc, onicpahpactoya, oniquinamacac
- VERB-Fin: cniqui
- 1
- Person[psor]
- 1
- NOUN: noconeu, tocniuan
- 3
- NOUN: ica, itich, iixco, ich, imaman, iuan, temachtani, tlaxcal
- 1
- Person[subj]
- 1
- VERB: nitiquititi, niyas, nicmati, nicniqui, oniquinamacac, oniyectlacua, onquimitac
- 2
- NOUN: caxmeh
- VERB: Ixnechnextili, ixmomachihchicaua, otinechtlaocole, otiuala, otualah, xitiquiti
- 3
- ADJ: chichiltiqueh, istac, tliltic
- AUX: catqui, huili, o, ocatca
- NOUN: ica, itich, tlacatl, altipetl, atl, cali, caxmeh, corral, pouitl, calihtic
- VERB: tolohtoc, Oquis, Xictiqui, chocholoca, cniqui, cocoxqueh, micqui, mocaua, mouiqueh, nechnamiqui
- VERB-Fin: cniqui
- 1
- Subcat
- Intr
- AUX: catqui, ocatca, huili, o
- VERB: nitiquititi, niyas, tolohtoc, Oquis, cocoxqueh, mouiqueh, omic, otiuala, peua, tookatoc
- Tran
- VERB: Onictlapan, onictlame, Ixnechnextili, Xictiqui, Xinechmaca, ixmomachihchicaua, nacacuah, nechnamiqui, nicmati, nicniqui
- Intr
- Typo
- Yes
- ADJ: poyic
- ADV: oc, Amo
- AUX: Uislis
- DET: ok, oc
- NOUN: mo, no, láps
- VERB: in, nimo, oc, otimo, se, xic, ik, xitechon
- VERB-Fin: in, ik
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 14 lemmas as copulas (cop). Examples: catqui, yehuatl, katki, yehwatl, youi, ye, ser, yeh, yejuatl, i, yehhuatl, yej, yejwatl, yen.
- This corpus uses 19 lemmas as auxiliaries (aux). Examples: ma, uili, o, mo, peua, pewi, oc, mach, wili, _, huili, catqui, kisa, pehua, estar, haber, nimi, witsi, witzi.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (378)
- VERB--NOUN-ADP(icanca) (1)
- VERB--NOUN-Abs (10)
- VERB--PRON (289)
- VERB-Fin--NOUN (215)
- VERB-Fin--NOUN-ADP(de) (2)
- VERB-Fin--NOUN-Abs (109)
- VERB-Fin--NOUN-Abs-ADP(de) (1)
- VERB-Fin--PRON (211)
- obj
- VERB--NOUN (513)
- VERB--NOUN-Abs (12)
- VERB--PRON (67)
- VERB-Fin--NOUN (217)
- VERB-Fin--NOUN-ADP(de) (1)
- VERB-Fin--NOUN-Abs (100)
- VERB-Fin--NOUN-Abs-ADP(quemeh) (1)
- VERB-Fin--PRON (67)
- VERB-Fin--PRON-Acc (1)
- iobj
- VERB--NOUN (11)
- VERB--NOUN-ADP(para) (1)
- VERB--PRON (7)
- VERB-Fin--NOUN (7)
- VERB-Fin--PRON (2)
- VERB-Inf--NOUN-ADP(a) (1)
- VERB-Inf--PRON (1)
Relations Overview
- This corpus uses 2 relation subtypes: acl:relcl, advmod:neg
- The following 3 relation types are not used in this corpus at all: expl, clf, list