UD Nheengatu CompLin
Language: Nheengatu (code: yrl)
Family: Tupian
This treebank has been part of Universal Dependencies since the UD v2.11 release.
The following people have contributed to making this treebank part of UD: Leonel Figueiredo de Alencar, Dominick Maia Alexandre.
Repository: UD_Nheengatu-CompLin
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-NC-SA 4.0
Genre: spoken, bible, fiction, nonfiction, grammar-examples
Questions, comments? General annotation questions (either Nheengatu-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [leonel • de • alencar (æt) ufc • br]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | annotated manually |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
UD_Nheengatu-CompLin is a treebank of Nheengatu, also known as Modern Tupi and Língua Geral Amazônica (ISO 639: yrl). It comprises sentences drawn from a wide range of published sources, including spontaneous speech, grammatical descriptions, fables, myths, coursebooks, and dictionaries.
This is the first morphosyntactic treebank of Nheengatu. It remains a work in progress, with ongoing expansion planned for the coming months.
The treebank comprises sentences from a wide range of published sources freely available online, including grammatical descriptions, fables, myths, coursebooks, and dictionaries. The sentences were extracted either from PDF text files, transcribed from non-searchable (image-only) PDFs, or manually converted from phonetic transcriptions into orthography. Throughout the treebank, we generally adopt the spelling system proposed by Avila (2021), diverging from it only in a few cases.
The annotation was performed semi-automatically: we first applied the Yauti morphosyntactic analyzer (de Alencar 2023, 2025) to each sentence and then manually revised the output.
The development of this treebank and related tools is part of the research activities of the Research Group on Computation and Natural Language (Computação e Linguagem Natural — CompLin) at the Humanities Center of the Federal University of Ceará, Brazil. The main contributor to this effort is Leonel Figueiredo de Alencar, coordinator of the CompLin group. Additional annotators include Dominick Maia Alexandre, Hélio Leonam Barroso Silva, and Juliana Lopes Gurgel, who was a scholarship holder in the DACILAT project funded by the São Paulo Research Foundation (Fundação de Amparo à Pesquisa do Estado de São Paulo — FAPESP), Process No. 22/09158-5.
The following repository contains the most up-to-date development version of the treebank, as well as related tools and resources:
https://github.com/CompLin/nheengatu
The treebank currently includes examples from Seixas (1853), Hartt (1872), Magalhães (1876), Sympson (1877), Rodrigues (1890), Aguiar (1898), Costa (1909), Studart (1926), Amorim (1928), Hartt (1938), Moore, Facundes, and Pires (1994), Casasnovas (2006), Cruz (2011), Comunidade de Terra Preta (2013), Stradelli (1929/2014), Navarro (2016), Melgueiro, Câmara, and Martins (2019), Muller et al. (2019), de Alencar (2021), Avila (2021), and Melgueiro (2022), as well as from the Novo Testamento na língua Nyengatu (1973/2019) and issues 3 and 17 of the Leetra Indígena journal (Universidade Federal de São Carlos, 2014, 2015).
Acknowledgments
We thank Eduardo de Almeida Navarro (University of São Paulo) for kindly allowing us to use examples and texts from his coursebook (Navarro 2016), whose glossary served as the initial basis for the morphological analyzer used to annotate the UD_Nheengatu-CompLin treebank.
We are greatly indebted to Avila (2021)’s dictionary, from which numerous treebank sentences are drawn. This resource also provided invaluable lexical, grammatical, and semantic information for the further development of the morphological analyzer and related annotation tools. We are especially grateful to its author, Marcel Twardowsky Avila, for making the XML version of the dictionary available to us and for clarifying many questions regarding its entries.
We gratefully acknowledge the scholarships awarded to annotators by the São Paulo Research Foundation (FAPESP), through the DACILAT project (Process No. 22/09158-5), and by the Foundation for the Support and Development of Research in the State of Ceará (FUNCAP).
We are indebted to Gabriela Lourenço Fernandes and Susan Gabriela Huallpa Huanacuni, interns at the Biblioteca Brasiliana Guita e José Mindlin of the University of São Paulo (USP), as well as to its research specialist and curator, João Marcos Cardoso, for their transcriptions of stories from Amorim (1928) and Rodrigues (1890).
We also thank the Federal University of Amazonas Press (Editora da Universidade Federal do Amazonas — UFAM), particularly its director, Sérgio Freire, for granting permission to incorporate texts from Casasnovas (2006) into the treebank.
License
The copyright of the treebank sentences and their translations remains with their respective authors. This data is made available solely to support research, teaching, and the learning of the Nheengatu language. It should not be used for commercial purposes. For more information, see LICENSE.txt.
References
-
Aguiar, Costa. (1898). Doutrina christã destinada aos naturaes do Amazonas em nhihingatu com traducção portugueza em face. Pap. e Tip. Pacheco, Silva & C.
-
Avila, Marcel Twardowsky. (2021). Proposta de dicionário nheengatu-português (Doctoral dissertation, University of São Paulo). https://doi.org/10.11606/T.8.2021.tde-10012022-201925
-
Casasnovas, Afonso. (2016). Noções de língua geral ou nheengatú: Gramática, lendas e vocabulário (2nd ed.). Editora da Universidade Federal do Amazonas; Faculdade Salesiana Dom Bosco.
-
Comunidade de Terra Preta. (2013). Fábulas de Terra Preta: Uma coletânea bilíngue.
-
Costa, D. Frederico. (1909). Carta pastoral de D. Frederico Costa bispo do Amazonas a seus amados diocesanos. Typ. Minerva.
-
Cruz, Aline da. (2011). Fonologia e gramática do nheengatú: A língua falada pelos povos Baré, Warekena e Baniwa. Netherlands National Graduate School of Linguistics.
-
de Alencar, Leonel Figueiredo. (2021). Uma gramática computacional de um fragmento do nheengatu / A computational grammar for a fragment of Nheengatu. Revista de Estudos da Linguagem, 29(3), 1717–1777. http://dx.doi.org/10.17851/2237-2083.29.3.1717-1777
-
de Amorim, Antonio Brandão. (1928). Lendas em nheêngatú e em portuguez. Revista do Instituto Historico e Geographico Brasileiro, 154(100), 9–475.
-
de Magalhães, J. V. C. (1876). O selvagem. Typographia da Reforma.
-
Hartt, Charles Frederick. (1872). Notes on the Lingoa Geral or Modern Tupi of the Amazonas. Transactions of the American Philological Association, 3, 58–76. https://www.jstor.org/stable/310258
-
Hartt, Charles Frederick. (1938). Notas sobre a língua geral, ou tupí moderno do Amazonas. Anais da Biblioteca Nacional do Rio de Janeiro, 51, 305–390. Rio de Janeiro: M. E. S. Serviço Gráfico.
-
Maslova, Irina. (2018). Tradução comentada de mitos e lendas amazônicas do nheengatu para o russo (Master’s thesis, University of São Paulo). https://doi.org/10.11606/D.8.2019.tde-22022019-175350
-
Melgueiro, Edilson Martins, Câmara, Ana Suelly Arruda, & Martins, Marci Fileti. (2019). Orações relativas em Nheengatú ou Ingatú. Revista Brasileira de Linguística Antropológica, 11(2), 16. https://doi.org/10.26512/rbla.v11i02.28115
-
Melgueiro, Edilson Martins. (2022). O Nheengatu de Stradelli aos dias atuais: uma contribuição aos estudos lexicais de línguas Tupí-Guaraní em perspectiva diacrônica (Doctoral dissertation, University of Brasília). http://repositorio2.unb.br/jspui/handle/10482/44655
-
Moore, Denny, Facundes, Sidney, & Pires, Nádia. (1994). Nheengatu (Língua Geral Amazônica), its history, and the effects of language contact. Department of Linguistics, University of California, Berkeley. https://escholarship.org/uc/item/7tb981s1
-
Muller, Jean-Claude, Dietrich, Wolf, Monserrat, Ruth, Barros, Cândida, Arenz, Karl-Heinz, & Prudente, Gabriel (Eds.). (2019). Dicionário de língua geral amazônica. Universitätsverlag Potsdam; Museu Paraense Emílio Goeldi.
-
Navarro, Eduardo de Almeida. (2016). Curso de língua geral (nheengatu ou tupi moderno): A língua das origens da civilização amazônica (2nd ed.). Centro Angel Rama, FFLCH, Universidade de São Paulo.
-
Novo Testamento na língua Nyengatu (2nd ed.). (2019). Missão Novas Tribos do Brasil. (Original work published 1973)
-
Rodrigues, João Barbosa. (1890). Poranduba amazonense ou kochiyma-uara porandub, 1872–1887. Typ. de G. Leuzinger & Filhos.
-
Seixas, Manoel Justiniano de. (1853). Vocabulario da lingua indigena geral para o uso do Seminario Episcopal do Pará. Typ. de Mattos e Compª.
-
Stradelli, Ermanno. (2014). Vocabulário português-nheengatu, nheengatu-português. Ateliê Editorial. (Original work published 1929)
-
Studart, Jorge. (1926). Ligeiras noções de língua geral. Revista do Instituto do Ceará, 40, 26–38.
-
Sympson, Pedro Luiz. (1877). Grammatica da lingua brazilica geral, fallada pelos aborigines das provincias do Pará e Amazonas. Typographia do Commercio do Amazonas.
-
Universidade Federal de São Carlos, Laboratório de Linguagens LEETRA. (2014). Leetra Indígena, 3(3) [Edição especial: Yasú Yapurũgtitá Yẽgatú]. São Carlos, SP: UFSCar.
-
Universidade Federal de São Carlos, Laboratório de Linguagens LEETRA. (2015). Leetra Indígena, 1(17) [Edição especial: Escola Kariamã conta umbuesá]. São Carlos, SP: UFSCar.
Statistics of UD Nheengatu CompLin
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
AdpType – AdvType – Aspect – Case – Clitic – Compound – Definite – Degree – Deixis – Derivation – Evident – ExtPos – Foc – Modality – Mood – Number – Number[grnd] – Number[psor] – NumType – PartType – Person – Person[grnd] – Person[psor] – Polarity – Poss – PronType – PunctType – Red – Rel – Style – Tense – Typo – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advcl:relcl – advmod – amod – appos – aux – case – cc – ccomp – compound – conj – cop – csubj – dep – det – discourse – dislocated – expl – fixed – flat – goeswith – iobj – mark – nmod – nmod:poss – nsubj – nummod – obj – obl – orphan – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 2839 sentences, 26444 tokens and 26848 syntactic words.
- This corpus contains 7905 tokens (30%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 210 types of words that contain both letters and punctuation. Examples: waá-itá, mira-itá, kwá-itá, amú-itá, kunhã-itá, apigawa-itá, anama-itá, maã-itá, nhaã-itá, kunhã-etá, taína-itá, pirá-itá, raíra-itá, rimirikú-itá, kamarara-itá, kariwa-itá, tayera-itá, yawé-yawé, mirá-piranga, pindá-itá, rundewara-itá, wirá-itá, amú-etá, apigawa-etá, kunhamukú-itá, mimbira-itá, mira-etá, mirá-itá, mú-itá, pirá-mirĩ, suú-itá, taria-itá, taíra-itá, tuixawa-etá, wirá-mirĩ, yepé-yepé, amú-tetamawara, amú-wirandé, arú-itá, ikewara-itá, iwá-itá, kunawarú-etá, kurabí-itá, kurasí-ara, kurumiwasú-itá, kurumĩ-itá, kurupira-itá, kuẽma-piranga, mbira-itá, mena-itá
- This corpus contains 404 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 221 types of multi-word tokens. Examples: árupi, pitérupi, maita, wírupi, iwí-pe, kaá-pe, resú-putari, asú-putari, kupixá-pe, paraname, Maã-ta, igarupá-pe, ipí-pe, kupé-pe, rembií-pe, ukwáu-putari, xamunhã-kwáu, Tupayú-pe, asú-kwáu, gantime, marã, pausá-pe, putiá-pe, remenari-putari, rupitá-pe, usú-putari, uwatá-kwáu, uyuká-putari, uyuyuká-putari, xasú-putari, Piauíwara, Ukiririntu, ambaú-putari, amuriwera, apurakí-putari, awá-ta, mixukúi, pawasá-pe, piá-pe, rasú-kwáu, resá-pe, resú-kwáu, ripí-pe, rumasá-pe, tatá-pe, unheẽwera, upisika-putari, xaseruka-kari, xawitá-kwáu, xibentu.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 64 word types tagged as particles (PART): Aikuré, Aé, Eẽ, Kusukúi, Masekúi, Teẽ, aikwewara, aikwé, amú, ana, arama, arã, ba, emú, eré, imú, inté, intí, intíu, ipú, katú, ku, kurí, kwera, maã, nti, nẽ, p, pawa, paá, presizu, pu, pá, páu, ra, rakú, ranhẽ, raĩ, raẽ, rã, rẽ, saĩ, será, supí, ta, taá, taé, te, tenhẽ, tenki, tenupá, ti, tu, té, tẽ, umbaá, wana, warama, wera, wã, wé, xukúi, yepé, ã
- This corpus contains 45 lemmas tagged as pronouns (PRON): aintá, aité, amú, awá, aé, aúna, i, indé, indéu, iné, inéu, ixé, ixéu, kwaá, kwá, manungara, maã, muiriira, mukũi, mukũi-itá, muíri, ne, nhaã, panhẽ, pawé, pe, penhẽ, se, setá, siiya, sitá, siya, siía, ta, turusú, upawé, upaĩ, waá, xe, yandé, yané, yanéu, yawé, yepé, yepé-yepé
- This corpus contains 22 lemmas tagged as determiners (DET): aité, amú, awá, aé, kwaá, kwá, mawaá, maã, muíri, nhaã, panhẽ, paĩ, setá, siiya, siya, siía, turusú, upanhẽ, upaĩ, yawé, yepé, yepé-yepé
- Out of the above, 19 lemmas occurred sometimes as PRON and sometimes as DET: aité, amú, awá, aé, kwaá, kwá, maã, muíri, nhaã, panhẽ, setá, siiya, siya, siía, turusú, upaĩ, yawé, yepé, yepé-yepé
- This corpus contains 8 lemmas tagged as auxiliaries (AUX): ikú, kari, kwá, kwáu, puderi, putari, sú, yuíri
- Out of the above, 6 lemmas occurred sometimes as AUX and sometimes as VERB: ikú, kwá, kwáu, putari, sú, yuíri
- There are 3 (de)verbal forms:
- Fin
- AUX: uikú, usú, yasú, asú, xaikú, xasú, aikú, reikú, yaikú, Ekũi
- VERB: unheẽ, usú, usika, umaã, umunhã, urikú, upitá, upisika, umbeú, uri
- Inf
- AUX: putari, kwáu, ikú, kari, kwá, vutari
- VERB: yumunhã, putari, rasú, yuká, munhã, nupá, watá, kutuka, kwáu, mukaẽ
- Vnoun
- VERB: ukwawasawa, pemanduarisawa, remanduarisawa, uyumimisawa, yamanduarisawa, hamanduarisawa, pekwasawa, rekwawasawa, ukaamunusawa, umundisá
Nominal Features
- Plur
- AUX-Fin: yasú, yaikú, pesú, yapuderi, Pekũi, Pepuderi, peikú, Tausú, taikú, tasú
- DET: kwá-itá, nhaã-itá, amú-itá
- NOUN: mira-itá, kunhã-itá, apigawa-itá, anama-itá, maã-itá, kunhã-etá, taína-itá, pirá-itá, kamarara-itá, kariwa-itá
- PRON: aintá, yané, ta, yandé, penhẽ, waá-itá, pe, amú-itá, kwá-itá, nhaã-itá
- VERB-Fin: yamunhã, yasú, yamaã, pemunhã, yaú, pemaã, yamanú, taunheẽ, pesendú, yayuká
- VERB-Vnoun: pemanduarisawa, yamanduarisawa, pekwasawa
- Sing
- AUX-Fin: asú, xaikú, xasú, aikú, reikú, Ekũi, resú, Kũi, Hapuderi, Hasú
- DET: nhaã, kwá, kwaá, amú, amu
- NOUN: ara, mira, apigawa, manha, igara, tupana, paraná, kunhã, pituna, yautí
- PRON: i, se, waá, aé, ne, ixé, indé, nhaã, kwá, amú
- PROPN: Tupayú
- VERB-Fin: xasú, reputari, rerikú, asú, resú, remaã, remunhã, xarikú, amaã, amunhã
- VERB-Vnoun: remanduarisawa, hamanduarisawa, rekwawasawa, xarikusawa
- Acc,Nom
- PRON: aé, aintá, ixé, indé, ta, penhẽ, yandé, iné, yané, aúna
- Dat
- PRON: ixéu, inéu, yanéu, indéu
- Gen
- PRON: i, se, ne, yané, aintá, pe, ta, xe, yandé, U
- Ind
- DET: yepé, muyepé
- PRON: yepé
Degree and Polarity
- Aug
- ADJ: Sepiasú, panemawasú, pixeasú
- NOUN: buyawasú, miráwasú, pitunawasú, kiririwasú, iwawasú, iwiwasú, marikawasú, piawasú, tiapuwasú, yawaratewasú-itá
- VERB-Fin: kirimawausú, xirĩwasú
- Cmp
- ADV: piri
- Dim
- ADJ: purangamirĩ
- NOUN: Abumirĩ, fardamirĩ, kunhamirĩ, kurumirĩ, kurusamirĩ-etá, makakaí, wirawasumirĩ-etá, yasimirĩ-itá
- PRON: setaíra
- Sup
- ADV: piri
- Neg
- PART: ti, intí, te, nẽ, nti, tenhẽ, umbaá, Teẽ, intíu, inté
- Pos
- PART: eré, Eẽ, Aé
Verbal Features
- Compl
- PART: pawa, pá, páu, p
- Cont
- PART: wé
- Freq
- ADV: Asuiwara, Ikewara, kwayewara, sewara, yawewara
- NOUN: arawara, rukawara
- PART: wera, aikwewara
- VERB-Fin: Amanduariwara, Asuwara
- Frus
- PART: yepé
- Hab
- SCONJ: rametiwa
- VERB-Fin: ambautiwa, ukanhemutiwa, umundutiwa, upinaitikatiwa, upurungitatiwa, usutiwa, uyukatiwa
- Imp
- PART: rẽ, ranhẽ, raẽ, raĩ, saĩ
- Iter
- AUX-Fin: yayuíri
- VERB-Fin: uyuíri, xayuíri
- Perf
- PART: ana, ã, wã, wana
- Imp
- AUX-Fin: Ekũi, Kũi, resú, Pekũi, pesú
- VERB-Fin: remaã, yuri, Ekũi, Epurú, Iruri, retirika, eyuri, ikũi, pemunhã, remeẽ
- Imp,Ind
- AUX-Fin: reikú, resú, pesú, Pepuderi
- VERB-Fin: rerikú, remunhã, resú, Remaã, remundú, rembeú, reruri, pemaã, pemunhã, pewatá
- Ind
- AUX-Fin: uikú, usú, yasú, asú, xaikú, xasú, aikú, yaikú, reikú, upuderi
- VERB-Fin: unheẽ, usú, usika, umaã, umunhã, urikú, upitá, upisika, umbeú, uri
- Fut
- PART: kurí, arama, arã, ku, rã, warama
- Past
- PART: kwera, wera
- Pres
- ADV: Asuiwara, Ikewara, kwayewara, sewara, yawewara
- NOUN: arawara, rukawara
- PART: aikwewara
- VERB-Fin: Amanduariwara, Asuwara
- Mid,Pass
- VERB-Fin: uyumunhã, uyuyuká, uyuyumimi, xayumumeú, xayuruyari, Reyumupuranga, Reyuyumimi, Uyupurungitá, Xayumusakú, hayumukwaíra
- VERB-Inf: yumunhã, Yukindawa, Yumuatiri, yemuí, yumumeú, yumuseruka, yumuí, yupiruka, yusalvari
- Nfh
- PART: paá
Pronouns, Determiners, Quantifiers
- Art
- DET: yepé, muyepé
- PRON: yepé
- Dem
- ADV: iké, ape, kwá, akití, aape, mi, kí, Mimi, mikití, ké
- DET: nhaã, kwá, kwaá, kwá-itá, aé, nhaã-itá
- PRON: nhaã, kwá, kwá-itá, kwaá, nhaã-itá, aé
- Emp
- DET: aité
- PRON: aité
- Ind
- ADV: mairamé, makití, mayé, masuí, marupí
- DET: amú, siiya, maã, siía, muíri, setá, yawé, siya, turusú, yawé-yawé
- PRON: maã, awá, amú, manungara, amú-itá, siya, siiya, mukũi-itá, setá, amú-etá
- Int
- ADV: mayé, mamé, makití, mairamé, marama, marupí, mayawé, masuí, maita, Maí
- DET: maã, muíri, awá, Mawaá
- PRON: maã, awá, Muíri
- Prs
- PRON: i, se, aintá, aé, ne, ixé, indé, yané, ta, yandé
- Rel
- ADV: mamé, makití, mayé, marupí, masuí, mairamé
- DET: maã
- PRON: waá, waá-itá, awá, maã
- Tot
- DET: panhẽ, upaĩ, muíri, upanhẽ, paiu
- PRON: panhẽ, upaĩ, muíri, pawé, upawé
- Card
- NUM: mukũi, musapiri, yepé, pú-mukũi, sete, 1930, Oito, irundí, kwaru, nove
- Ord
- ADV: mukũisawa, primeru
- Yes
- PRON: se, i, ne, yané, aintá, ta, pe, xe, yandé
- 1
- AUX-Fin: yasú, asú, xaikú, xasú, aikú, yaikú, yapuderi, Hapuderi, Hasú, apuderi
- PRON: se, ixé, yané, yandé, ixéu, xe, yanéu, su
- VERB-Fin: xasú, asú, xarikú, yamunhã, yasú, amaã, amunhã, aputari, xamunhã, yamaã
- VERB-Vnoun: yamanduarisawa, hamanduarisawa, xarikusawa
- 2
- AUX-Fin: reikú, Ekũi, resú, pesú, Kũi, Pekũi, Pepuderi, peikú, repuderi
- PRON: ne, indé, penhẽ, pe, iné, inéu, indéu, intí, n
- VERB-Fin: reputari, rerikú, resú, remaã, remunhã, pemunhã, reyuri, pemaã, renheẽ, rembeú
- VERB-Vnoun: pemanduarisawa, remanduarisawa, pekwasawa, rekwawasawa
- 3
- AUX-Fin: uikú, usú, upuderi, Tausú, taikú, tasú, taupuderi, urikú
- PRON: i, aintá, aé, ta, aúna, U, intá
- VERB-Fin: unheẽ, usú, usika, umaã, umunhã, urikú, upitá, upisika, umbeú, uri
- VERB-Vnoun: ukwawasawa, uyumimisawa, ukaamunusawa, umundisá, upukasawa, uputarisá, usikiesá, uyanasá
- Sing
- NOUN: suka, sera, ximirikú, taíra, ximiára, sesá, suaxara, sakakwera, sawa, sukwera
Other Features
- AdpType
- Post
- ADP: upé, kití, suí, irumu, rupí, supé, arama, xupé, resé, ramé
- Prep
- ADP: até, té
- Post
- AdvType
- Cau
- ADV: aresé, ape, aramé, nhaãsé, kurumú, marama, Mairamé, marã
- Con
- ADV: Ma, nuká
- Deg
- ADV: reté, katú, piri, xinga, retana, yuíri, mirĩ, turusú, retã, puru
- Loc
- ADV: iké, apekatú, mamé, ape, makití, marupí, kwá, masuí, akití, arupí
- Man
- ADV: yawé, mayé, puranga, kwayé, kutara, puxí, kirimbawa, amurupí, katú, kurutẽi
- Mod
- ADV: kuité, kuté
- Tim
- ADV: asuí, kuíri, ape, aramé, aiwana, yeperesé, wirandé, ariré, kuxiima, aape
- Cau
- Clitic
- Yes
- ADP: pe, upé, wara, me, arã
- ADV: ntu, mi
- PART: taá, wera, ta
- Yes
- Compound
- Yes
- AUX-Inf: putari, kwáu, kari, kwá, vutari
- Yes
- Deixis
- Prox
- ADV: iké, kwá, kí, ké, kwaá
- DET: kwá, kwaá, kwá-itá
- PRON: kwá, kwá-itá, kwaá
- Remt
- ADV: ape, akití, aape, mi, Mimi, mikití, mumi
- DET: nhaã, aé, nhaã-itá
- PRON: nhaã, nhaã-itá, aé
- Prox
- Derivation
- Coll
- NOUN: itatiwa, kapĩtiwa, mirawasutiwa, sakaitiwa, siringatiwa, wakutiwa
- Priv
- ADJ: iwasuíma, santaíma, uyiima
- ADV: tiapuíma
- NOUN: Adana-ima, apisaíma, ara-ima, kiinha-ima, payaíma, sawa-ima, seraíma, tĩ-ima, ximirikú-ima
- VERB-Fin: kiaíma
- Coll
- ExtPos
- ADV
- ADV: yawé, Kutara, yawewara
- PART: intí, ti, aikwé, nẽ
- PRON: maã
- CCONJ
- CCONJ: u
- DET
- ADV: mayé
- PRON
- ADV: Maí
- PART: nẽ
- SCONJ
- PRON: waá
- ADV
- Foc
- Yes
- PART: tẽ, tenhẽ, té, katú, ra
- Yes
- Modality
- Cond
- PART: maã, imú, amú, emú
- Proh
- PART: te, tenhẽ, Teẽ
- Cond
- Number[grnd]
- Sing
- ADP: sesé, suakí, sakakwera, sesewara, suaxara
- Sing
- PartType
- Emp
- PART: tẽ, tenhẽ, té, katú, ra
- Exs
- PART: aikwé, Aikuré, aikwewara
- Int
- PART: taá, será, ta, taé, tu
- Mod
- PART: paá, pu, maã, supí, te, eré, tenhẽ, tenupá, tenki, ipú
- Neg
- PART: ti, intí, nẽ, nti, umbaá, intíu, inté
- Prs
- PART: xukúi, Kusukúi, Masekúi
- Emp
- Person[grnd]
- 3
- ADP: sesé, suakí, sakakwera, sesewara, suaxara
- 3
- Person[psor]
- 3
- NOUN: suka, sera, ximirikú, taíra, ximiára, sesá, suaxara, sakakwera, sawa, sukwera
- 3
- PunctType
- Elip
- PUNCT: [...]
- Elip
- Red
- Yes
- ADJ: purapuranga, aíwa-aíwa, pixuna-pixuna
- DET: yawé-yawé
- NOUN: tapurú-tapurú
- PRON: yawé-yawé
- VERB-Fin: uyawiyawika, Akaá-kaá, Tasuú-suú, Utuká-tuká, aganaganari, atuká-tuká, ipukukapukuka, takaú-kaú, ukaúkaú, ukikiri
- Yes
- Rel
- Abs
- NOUN: uka, tatá, ukara, tuixawa, tetama, timbiú, ukena, pé, teapú, tendawa
- Cont
- ADP: resé, resewara, ruakí, rakakwera, renundé, aresé, rakwera, ruaxara, renuné, rikuyara
- NOUN: ruka, raíra, ramunha, retama, rimirikú, rapé, rera, rupitá, rangawa, resá
- SCONJ: resewara
- VERB-Fin: rurí, resarái, raisú, rikwé, ranhẽ, rakú, rapí, rawa, renúi
- VERB-Inf: ripiaka
- NCont
- ADP: sesé, suakí, sakakwera, sesewara, suaxara
- NOUN: suka, sera, ximirikú, taíra, ximiára, sesá, suaxara, sakakwera, sawa, sukwera
- VERB-Fin: surí, sasí, tiapú, sakú, sikwé, setá, tipí, Ikupukú, sesaíma, sawa
- Abs
- Style
- Arch
- ADP: aresé, resewara
- AUX-Fin: xaikú, xasú
- AUX-Inf: ikú
- NOUN: suá, tuixawa, ukena, rangawa, Rapé, imirikú, ií, sakapira, sapiá, sera
- PRON: se, yané, ne, yandé, aé, maã, i, pe, ixé, aúna
- SCONJ: kurumu
- VERB-Fin: xasú, xarikú, xamunhã, xaputari, xaú, xakwáu, xanheẽ, xawasemu, xayuíri, raisú
- VERB-Inf: yumunhã, putari, rasú, yuká, munhã, nupá, watá, kutuka, kwáu, yakáu
- VERB-Vnoun: xarikusawa
- Rare
- ADP: renuné
- NOUN: Yukasara, teapú
- PRON: Se, ixé
- VERB-Fin: Ururi, upiama, Uxipiá, umunhã, upena-upena
- VERB-Inf: piamu, Xari, maramunhã, piama, puapuãmu
- Arch
- Typo
- Yes
- ADJ: Iwatí, katú, menasara, puranga-itá, puriaisúa, suai, xapanema, xapuriaisúa
- ADP: rũ, aresé, pu, suí, aramé, iruma, pipé, rumu
- ADV: Maí, Mairamé, inte, Arareneíma, Mamé, Maramé, iramé, maãkití, mené, mumi
- AUX-Fin: urikú
- AUX-Inf: vutari
- CCONJ: yuri
- DET: muyepé, Maã, amu, paiu, riya
- NOUN: kaxiwer, kunhã, kunhãbukú, mirikú, rupirunawa, uka, Mukura, Sumura-etá, Tapayuma, ara
- NUM: Yepé, muyepé
- PART: Aé, Ti, p, Aikuré, Intí, Masekúi, inté, maã, saĩ, tu
- PRON: maã, i, Nhaã, U, intá, intí, n, se, su
- SCONJ: Sa
- VERB-Fin: ipiama, Humbú, Pempisasúa, Xamuarpakwári, Xapetika, Xenheẽ, Yapituú, a, akanhemu, imaasí
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: ikú.
- This corpus uses 8 lemmas as auxiliaries (aux). Examples: sú, ikú, putari, kwáu, puderi, kari, kwá, yuíri.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (922)
- VERB-Fin--PRON (227)
- VERB-Fin--PRON-Acc,Nom (571)
- VERB-Fin--PRON-Gen (96)
- VERB-Inf--NOUN (40)
- VERB-Inf--PRON (4)
- VERB-Inf--PRON-Acc,Nom (15)
- VERB-Vnoun--NOUN (1)
- VERB-Vnoun--PRON (1)
- obj
- VERB-Fin--NOUN (1057)
- VERB-Fin--NOUN-ADP(resé) (3)
- VERB-Fin--PRON (212)
- VERB-Fin--PRON-ADP(irũ) (1)
- VERB-Fin--PRON-Acc,Nom (225)
- VERB-Fin--PRON-Gen (8)
- VERB-Fin--PRON-Gen-ADP(irumu) (1)
- VERB-Inf--NOUN (8)
- VERB-Inf--PRON (1)
- VERB-Inf--PRON-Acc,Nom (4)
- VERB-Inf--PRON-Gen (25)
- VERB-Vnoun--PRON (1)
- iobj
- VERB-Fin--NOUN (2)
- VERB-Fin--NOUN-ADP(resé) (1)
- VERB-Fin--NOUN-ADP(rã) (1)
- VERB-Fin--NOUN-ADP(supé) (43)
- VERB-Fin--NOUN-ADP(supé)-ADP(arama) (1)
- VERB-Fin--NOUN-ADP(xupé) (6)
- VERB-Fin--NOUN-ADP(xupé)-ADP(arama) (2)
- VERB-Fin--PRON (3)
- VERB-Fin--PRON-ADP(supé) (1)
- VERB-Fin--PRON-ADP(supé)-ADP(arama) (1)
- VERB-Fin--PRON-Acc,Nom (11)
- VERB-Fin--PRON-Acc,Nom-ADP(arama) (27)
- VERB-Fin--PRON-Acc,Nom-ADP(arã) (18)
- VERB-Fin--PRON-Dat (25)
- VERB-Fin--PRON-Gen-ADP(arama) (2)
- VERB-Fin--PRON-Gen-ADP(supé) (17)
- VERB-Fin--PRON-Gen-ADP(xupé) (48)
- VERB-Fin--PRON-Gen-ADP(xupé)-ADP(arã) (2)
- VERB-Inf--PRON-Acc,Nom-ADP(supé) (1)
Relations Overview
- This corpus uses 3 relation subtypes: acl:relcl, advcl:relcl, nmod:poss
- The following 2 relation types are not used in this corpus at all: clf, list