UD Mbya Guarani Thomas
Language: Mbya Guarani (code: gun
)
Family: Tupian
This treebank has been part of Universal Dependencies since the UD v2.4 release.
The following people have contributed to making this treebank part of UD: Guillaume Thomas.
Repository: UD_Mbya_Guarani-Thomas
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-NC-SA 4.0
Genre: nonfiction
Questions, comments? General annotation questions (either Mbya Guarani-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [guillaume • thomas (æt) utoronto • ca]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | assigned by a program, not checked manually |
UPOS | assigned by a program, with some manual corrections, but not a full manual verification |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
Relations | assigned by a program, with some manual corrections, but not a full manual verification |
Description
UD Mbya_Guarani-Thomas is a corpus of Mbyá Guaraní (Tupian) texts collected by Guillaume Thomas. The current version of the corpus consists of three speeches by Paulina Kerechu Núñez Romero, a Mbyá Guaraní speaker from Ytu, Caazapá Department, Paraguay.
UD Mbya_Guarani-Thomas is a corpus of Mbyá Guaraní (Tupian) texts collected by Guillaume Thomas. The current version of the corpus consists of three speeches by Paulina Kerechu Núñez Romero, a Mbyá Guaraní speaker from Paraguay. These speeches were recorded in August 2017 in the Mbyá Guaraní community Ytu, Caazapá Department, Paraguay. They were transcribed by Ronaldi Recalde Centurion (Ytu community) and translated into Brazilian Portuguese by Alberto Álvares. The texts were interlinearized in SIL FieldWorks Language Explorer (Black and Simons 2006) and manually annotated in UD in Arborator (Gerdes 2013) by Guillaume Thomas. Features were converted automatically from the morphological glosses added in SIL FieldWorks Language Explorer.
Consider using the development version of the corpus, which contains the latest improvements, while the official release is updated every 6 months:
- https://github.com/UniversalDependencies/UD_Mbya_Guarani-Thomas/tree/dev
Acknowledgments
The development of the corpus was supported by a Connaught New Researcher Award to Guillaume Thomas at the University of Toronto.
Special thanks are due to Paulina Kerechu Núñez Romero for allowing us to use these recordings, and to Ronaldi Recalde Centurion and Alberto Álvares for their essential role in transcribing and translating these recordings.
References
-
Andrew Black and Gary Simons. 2006. The SIL FieldWorks Language Explorer Approach to Morphological Parsing. Computational Linguistics for Less studied Languages: Texas Linguistics Society, 10. SIL.
-
Kim Gerdes, 2013. Collaborative dependency annotation. In Journal Proceedings of the second international conference on dependency linguistics (DepLing 2013), 88-97.
Statistics of UD Mbya Guarani Thomas
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Clusivity – Clusivity[obj] – Clusivity[psor] – Clusivity[subj] – Mood – Number – Number[psor] – NumType – Person – Person[obj] – Person[subj] – Polarity – PronType – Subcat – VerbForm
Relations
acl – advcl – advmod – amod – appos – case – cc – ccomp – compound – compound:svc – conj – cop – csubj – dep:mod – det – discourse – dislocated – dislocated:cleft – fixed – flat – list – mark – nmod – nsubj – nummod – obj – obl – obl:sentcon – parataxis – parataxis:rep – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 98 sentences and 1318 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 110 types of words that contain both letters and punctuation. Examples: ha'e, ka'aguy, va'e, va'ekue, cheremiarirõ'i, mba'e, he'i, ko'agã, kova'e, romba'apo, ete'i, ko'ape, mba'eicha, nda'u, ra'e, ramo'i, va'erã, ho'u, ndoro'ui, ro'u, roñea'ã, Mava'e, Mba'echa, amombe'u, añemoñe'ẽ, e'ỹ, ha'e'i, hũ'i, kirami'i, kue'iry, kyri'ĩ, kyrĩgue'i, kyrĩngue'i, ome'ẽ, oreka'aguy, oñembo'e, oñemokyri'ĩmba, porã'i, rei'i, rãe'i, ñañoty'i, 'rã, Aroñoty'i, Ha'eve, Ko'apy, Mbegueve-gueve, Tembiapo'i, a'i, ague'i, aipoa'e
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: SYM, X
- This corpus contains 44 word types tagged as particles (PART): 'rã, Aromeno, Upearã, ani, ave, avi, e'ỹ, ete, ete'i, gui, jepe, jepi, jevy, ju, katu, ke, ko, kue'iry, kuera, kuery, kuri, ma, mavoi, mi, nda'u, neĩ, ngo, ni, pa, po, poteri, ra'e, ramo, ramo'i, rei, rei'i, ri, rã, ta, te, teri, tove, upei, voi
- This corpus contains 23 lemmas tagged as pronouns (PRON): che, chee, eta, ha'e, ichupe, kirami, ko'ava, kova'e, mava'e, mba'e, ndee, ndeevy, opa, ore, oregui, pende, pendevy, umia, upe, upea, upeve, ñande, ñandevy
- This corpus contains 6 lemmas tagged as determiners (DET): javive, kova'e, mava'e, mba'e, pavẽ, pe
- Out of the above, 3 lemmas occurred sometimes as PRON and sometimes as DET: kova'e, mava'e, mba'e
- This corpus contains 2 lemmas tagged as auxiliaries (AUX): iko, ĩ
- Out of the above, 2 lemmas occurred sometimes as AUX and sometimes as VERB: iko, ĩ
- There are 5 (de)verbal forms:
- Fin
- VERB: oiko, opa, ou, cheayvu, roju, romba'apo, oñevanga, roiko, rojapo, roñoty
- Inf
- VERB: ha'e, he'i, ndaipovei, aa, kyri'ĩ, aipoa'e, ha'e'i, ja'e, marã, ndaa'ei
- Post
- VERB: kuaa'i, pota
- Ser
- VERB: erekovy, ainy, eravy-ravy'i, oikovy, oiny, okuapy, ovy-ovy, rekovy
- Vnoun
- VERB: ka'aguy
Nominal Features
- Plur
- NOUN: kyrĩgue'i
- PRON: ore, ñande, ñandevy, pende, pendevy, oregui
- Sing
- PRON: chee, che, ndee, ndevy
Degree and Polarity
- Neg
- PRON: mba'eve
- VERB-Fin: ndoro'ui, ndapechai, Ndoroikuaavei, ndo'ui, ndojapoi
- VERB-Inf: ndaipovei, ndaa'ei
Verbal Features
- Des
- VERB-Fin: tojapouka, toendu
- Imp
- VERB-Fin: eekombo'e
- Ind
- VERB-Fin: oiko, opa, ou, cheayvu, roju, romba'apo, oñevanga, roiko, rojapo, roñoty
- VERB-Inf: ha'e, he'i, ndaipovei, aa, kyri'ĩ, aipoa'e, ha'e'i, ja'e, marã, ndaa'ei
- VERB-Ser: erekovy, ainy, eravy-ravy'i, oikovy, oiny, okuapy, ovy-ovy, rekovy
- VERB-Vnoun: ka'aguy
Pronouns, Determiners, Quantifiers
- Dem
- DET: kova'e, pe
- PRON: upea, upe, kirami'i, kova'e, pea, ko'ava, umia, upeve
- Ind
- PRON: heta
- Int
- DET: mba'e, Mava'e
- PRON: mba'e, Mava'e
- Prs
- PRON: ore, chee, ha'e, ichupe, ñande, che, ñandevy, ndee, pende, pendevy
- Tot
- DET: pavẽ, javive
- PRON: opa
- Card
- NUM: peteĩ, mokoĩ, mokoĩ'i
- 1
- PRON: ore, chee, ñande, che, ñandevy, oregui
- VERB-Ser: ainy
- 2
- PRON: ndee, pende, pendevy, ndevy
- 3
- PRON: ha'e, ichupe, heta
- VERB-Ser: oikovy, oiny, okuapy
- Plur
- NOUN: ñanerembiapo, oreayvu, orechy, orejaryi, oreka'aguy, pendejaryi, Ñanderu, oremba'e'i, orerembi'u, Ñanderuvicha
- Sing
- NOUN: cheremiarirõ'i, cheru, chejaryi, chetuva, Cheapoare, chehistoria, chememby, chememby'i, chepi'a'i, cherakykue
Other Features
- Clusivity
- Ex
- PRON: ore, oregui
- In
- PRON: ñande, ñandevy
- Ex
- Clusivity[obj]
- Ex
- VERB-Fin: orepytyvõ'i, orereko, orereroayvu
- In
- VERB-Fin: ñanderayvu
- Ex
- Clusivity[psor]
- Ex
- NOUN: oreayvu, orechy, orejaryi, oreka'aguy, oremba'e'i, orerembi'u
- In
- NOUN: ñanerembiapo, Ñanderu, Ñanderuvicha, ñandejara, ñandejaryi, ñandepy'a, ñandereko, ñaneñe'ẽ
- Ex
- Clusivity[subj]
- Ex
- VERB-Fin: roju, romba'apo, roiko, rojapo, roñoty, ndoro'ui, ro'u, roñea'ã, Ndoroikuaavei, Romoĩ
- In
- VERB-Fin: jaecha, jajapo, jaraa, jareko, ñañoty'i, ja'u, jaiko, jaikuaapa, jaje'apa, jajoguereko
- Ex
- Person[obj]
- 1
- VERB-Fin: chemoirũ, chereroñe'ẽ, orepytyvõ'i, orereko, orereroayvu, ñanderayvu
- 3
- VERB-Fin: imoiny
- 1
- Person[subj]
- 1
- VERB-Fin: cheayvu, roju, romba'apo, roiko, rojapo, roñoty, aiko, aipota, ajapo, areko
- 2
- VERB-Fin: ndapechai, reju, eekombo'e, ereikuaa, erendu, pemokañy, penderecharãi, pereko, remoñendu
- 3
- VERB-Fin: oiko, opa, ou, oñevanga, ho'u, oikuaa, oja, ojapo, opyta, heta
- 1
- Subcat
- Ditr
- VERB-Fin: ome'ẽ, tojapouka
- Indir
- VERB-Fin: roñea'ã, jaje'apa, oña'ã, penderecharãi
- VERB-Inf: ñeñandu
- Intr
- VERB-Fin: oiko, opa, ou, cheayvu, roju, romba'apo, oñevanga, roiko, aiko, oja
- VERB-Inf: ndaipovei, aa, kyri'ĩ, marã, porãina, tuichapa'i
- VERB-Vnoun: ka'aguy
- Tran
- VERB-Fin: rojapo, roñoty, aipota, ajapo, areko, ho'u, jaecha, ndoro'ui, oikuaa, ojapo
- VERB-Inf: ha'e, he'i, aipoa'e, ha'e'i, ja'e, ndaa'ei
- Ditr
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: iko.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (29)
- VERB-Fin--PRON (44)
- VERB-Inf--NOUN (10)
- VERB-Inf--PRON (2)
- obj
- VERB-Fin--NOUN (39)
- VERB-Fin--PRON (15)
- VERB-Inf--NOUN (2)
- VERB-Inf--PRON (2)
Relations Overview
- This corpus uses 5 relation subtypes: compound:svc, dep:mod, dislocated:cleft, obl:sentcon, parataxis:rep
- The following 1 main types are not used alone, they are always subtyped: dep
- The following 6 relation types are not used in this corpus at all: iobj, expl, aux, clf, orphan, goeswith