UD Nheengatu CompLin
Language: Nheengatu (code: yrl
)
Family: Tupian
This treebank has been part of Universal Dependencies since the UD v2.11 release.
The following people have contributed to making this treebank part of UD: Leonel Figueiredo de Alencar.
Repository: UD_Nheengatu-CompLin
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-NC-SA 4.0
Genre: spoken, bible, fiction, nonfiction, grammar-examples
Questions, comments? General annotation questions (either Nheengatu-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [leonel • de • alencar (æt) ufc • br]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | annotated manually |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
The UD_Nheengatu-CompLin is a treebank of Nheengatu or Nhengatu (ISO-639: yrl
), also known, inter alia, as Modern Tupi and Língua Geral Amazônica. It comprises sentences from diverse published sources, e.g., spontaneous speech, grammatical descriptions, fables, myths, coursebooks, and dictionaries.
To our knowledge, this is the first treebank of Nheengatu. It is a work in progress. The initial release only contained a couple hundred sentences. This new release encompasses more than nine times that number. We plan to continually expand the resource in the next months.
The treebank comprises sentences from diverse published sources freely available on the Internet, e.g., grammatical descriptions, fables, coursebooks, and dictionaries. The sentences were either extracted from PDF text files, transcribed from non-searchable (image-only) PDF files, or manually converted to orthography from phonetic transcriptions. Throughout the treebank, we use the spelling system proposed by Avila (2021). The annotation was performed semi-automatically, i.e., we first applied the Yauti morphosyntactic analyzer (de Alencar 2023) to each sentence and then manually revised the output.
The development of this treebank and related tools and resources is part of the research activities of the Research Group on Computation and Natural Language (Computação e Linguagem Natural — CompLin) at the Humanities Center of the Federal University of Ceará in Brazil. The main contributor to this effort is Leonel Figueiredo de Alencar, coordinator of the CompLin group. Additional annotators include Dominick Maia Alexandre, Hélio Leonam Barroso Silva, and Juliana Lopes Gurgel, scholarship holder with the DACILAT project, funded by the São Paulo State Research Support Foundation (Fundação de Amparo à Pesquisa do Estado de São Paulo — FAPESP) under Process No. 22/09158-5.
The following repository contains the most update development version of the treebank as well as related tools and resources:
https://github.com/CompLin/nheengatu
So far, the treebank includes examples from Seixas (1853), Magalhães (1876), Sympson (1877), Rogrigues (1890), Aguiar (1898), Costa (1909), Studart (1926), Amorim (1928), Hartt (1938), Moore, Facundes, and Pires (1994), Casasnovas (2006), Cruz (2011), Comunidade de Terra Preta (2013), Stradelli (1929/2014), Navarro (2016), Muller et al. (2019), Alencar (2021), Avila (2021), and Melgueiro (2022) as well as from the New Testament (Novo Testamento na língua Nyengatu, 1973/2019).
Acknowledgments
We thank Eduardo de Almeida Navarro (University of São Paulo) for kindly allowing us to use examples and texts from his coursebook (Navarro 2016), whose glossary was the first basis for the morphological analyzer we implemented to annotate the UD_Nheengatu-CompLin treebank.
We owe much to Avila (2021)’s dictionary, from which numerous treebank sentences stem. This dictionary also provided invaluable lexical, grammatical, and semantic information for the further development of the morphological analyzer and related treebank annotation tools. We are much obliged to its author, Marcel Twardowsky Avila, for making the XML version of the dictionary available to us and clarifying many questions about the dictionary entries.
We gratefully acknowledge the scholarships provided to annotators by both the São Paulo State Research Support Foundation (FAPESP) through the DACILAT project under Process No. 22/09158-5 and the Foundation for the Support and Development of Research in the State of Ceará (FUNCAP).
We are indebted to Gabriela Lourenço Fernandes and Susan Gabriela Huallpa Huanacuni, internees of the Biblioteca Brasiliana Guita e José Mindlin of the University of São Paulo (USP), as well as to its research specialist and curator João Marcos Cardoso, for transcriptions of stories from Amorim (1928) and Rodrigues (1890).
Thanks are also due to the Federal University of Amazonas Press (Editora da Universidade Federal do Amazonas — UFAM), particularly to its director, Sérgio Freire, for granting permission to incorporate texts from Casasnovas (2006) into the treebank.
License
Copyright of the treebank sentences and their translations belongs to their respective authors. This data is made available here solely to promote research, teaching, and learning of the Nheengatu language. Therefore, it shouldn’t be used for any commercial purposes. For more information, see LICENSE.txt.
References
- Aguiar, Costa. (1898). Doutrina christã destinada aos naturaes do amazonas em nhihingatu’ com traducção portugueza em face. Pap. e Tip. Pacheco, Silva & C.
- Avila, Marcel Twardowsky.(2021). Proposta de dicionário nheengatu-português [Doctoral dissertation, University of São Paulo]. doi:10.11606/T.8.2021.tde-10012022-201925
- Casasnovas, Afonso. (2016). Noções de língua geral ou nheengatú: Gramática, lendas e vocabulário (2nd ed.). Editora da Universidade Federal do Amazonas; Faculdade Salesiana Dom Bosco.
- Comunidade de Terra Preta. (2013). Fábulas de Terra Preta: Uma coletânea bilingüe.
- Costa, D. Frederico. (1909). Carta pastoral de D. Frederico Costa bispo do Amazonas a seus amados diocesanos. Typ. Minerva.
- Cruz, Aline da. (2011). Fonologia e gramática do nheengatú: A língua falada pelos povos Baré, Warekena e Baniwa. Netherlands National Graduate School of Linguistics.
- de Alencar, Leonel Figueiredo. (2021). Uma gramática computacional de um fragmento do nheengatu / A computational grammar for a fragment of Nheengatu. Revista de Estudos da Linguagem, 29(3), 1717-1777. doi:http://dx.doi.org/10.17851/2237-2083.29.3.1717-1777
- de Amorim, Antonio Brandão. (1928). Lendas em nheêngatú e em portuguez. Revista do Instituto Historico e Geographico Brasileiro, 154(100), 9-475.
- de Magalhães, J. V. C. (1876). O selvagem. Typographia da Reforma.
- Maslova, Irina. (2018). Tradução Comentada de Mitos e Lendas Amazônicas do Nheengatu para o Russo. [Master’s thesis, University of São Paulo]. doi:10.11606/D.8.2019.tde-22022019-175350
- Melgueiro, Edilson Martins. (2022). O Nheengatu de Stradelli aos dias atuais: uma contribuição aos estudos lexicais de línguas Tupí-Guaraní em perspectiva diacrônica. [Doctoral dissertation, University of Brasília]. http://repositorio2.unb.br/jspui/handle/10482/44655
- Moore, Denny, Facundes, Sidney, & Pires, Nádia. (1994). Nheengatu (Língua Geral Amazônica), its History, and the Effects of Language Contact. UC Berkeley: Department of Linguistics. Retrieved from https://escholarship.org/uc/item/7tb981s1
- Muller, Jean-Claude, Dietrich, Wolf, Monserrat, Ruth, Barros, Cândida, Arenz, Karl-Heinz, & Prudente, Gabriel. (Eds.). (2019). Dicionário De Língua Geral Amazônica. Universitätsverlag Potsdam; Museu Paraense Emilio Goeldi.
- Navarro, Eduardo de Almeida. (2016). Curso de Língua Geral (nheengatu ou tupi moderno): A língua das origens da civilização amazônica (2nd ed.). Centro Angel Rama da Faculdade de Filosofia, Letras e Ciências Humanas da Universidade de São Paulo.
- Novo Testamento na língua Nyengatu (2nd ed.). (2019). Missão Novas Tribos do Brasil. (Original work published 1973)
- Rodrigues, João Barbosa. (1890). Poranduba amazonense ou kochiyma-uara porandub, 1872-1887. Typ. de G. Leuzinger & Filhos.
- Seixas, Manoel Justiniano de. (1853). Vocabulario da lingua indigena geral para o uso do Seminario Episcopal do Pará. Typ. de Mattos e Compª.
- Stradelli, Ermanno. (2014). Vocabulário português-nheengatu, nheengatu-português. Ateliê Editorial.(Original work published 1929) Here is the BibTeX entry formatted according to APA style:
- Studart, Jorge. (1926). Ligeiras noções de língua geral. Revista do Instituto do Ceará, 40, 26–38.
- Sympson, Pedro Luiz. Grammatica da lingua brazilica geral, fallada pelos aborigines das provincias do Pará e Amazonas. Typographia do Commercio do Amazonas, 1877.
Statistics of UD Nheengatu CompLin
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
AdpType – AdvType – Aspect – Case – Clitic – Compound – Definite – Degree – Deixis – Derivation – Evident – ExtPos – Foc – Modality – Mood – Number – Number[grnd] – Number[psor] – NumType – PartType – Person – Person[grnd] – Person[psor] – Polarity – Poss – PronType – PunctType – Red – Rel – Style – Tense – Typo – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advcl:relcl – advmod – amod – appos – aux – case – cc – ccomp – conj – cop – csubj – dep – det – discourse – dislocated – expl – fixed – flat – goeswith – iobj – mark – nmod – nmod:poss – nsubj – nummod – obj – obl – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 1832 sentences, 19037 tokens and 19278 syntactic words.
- This corpus contains 5468 tokens (29%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 179 types of words that contain both letters and punctuation. Examples: waá-itá, mira-itá, kwá-itá, amú-itá, kunhã-itá, apigawa-itá, anama-itá, nhaã-itá, taína-itá, maã-itá, pirá-itá, raíra-itá, rimirikú-itá, kunhã-etá, tayera-itá, mirá-piranga, pindá-itá, rundewara-itá, wirá-itá, kariwa-itá, kunhamukú-itá, kurasí-ara, mira-etá, mirá-itá, mú-itá, pirá-mirĩ, suú-itá, taria-itá, taíra-itá, wirá-mirĩ, yawé-yawé, yepé-yepé, amú-etá, amú-tetamawara, amú-wirandé, apigawa-etá, arú-itá, ikewara-itá, iwá-itá, kurabí-itá, kurumiwasú-itá, kurumĩ-itá, kurupira-itá, kuẽma-piranga, mbira-itá, mena-itá, mimbira-itá, mukũi-itá, nheenga-itá, paya-itá
- This corpus contains 241 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 134 types of multi-word tokens. Examples: pitérupi, maita, árupi, wírupi, asú-putari, resú-putari, iwí-pe, kaá-pe, paraname, asú-kwáu, gantime, ipí-pe, kupixá-pe, kupé-pe, putiá-pe, rupitá-pe, uyuká-putari, Maã-ta, Ukiririntu, ambaú-putari, amuriwera, apurakí-putari, marã, resú-kwáu, ripí-pe, rumasá-pe, tatá-pe, ukwáu-putari, unheẽwera, usú-putari, uwatá-kwáu, yakumame, Amaã-putari, Amaãntu, Amunhã-kari, Apiripana-putari, Apituú-putari, Asenúi-kari, Awá-ta, Ayuíri-putari, Igarapé-pe, Indé-ta, Kuíri-ta, Marã-ta, Piauíwara, Rekiri-putari, Remukaturu-kari, Resikari-putari, Reumpuka-putari, Sakakwerantu.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 49 word types tagged as particles (PART): Eẽ, Kusukúi, aikwewara, aikwé, ana, arama, arã, ba, eré, intí, intíu, ipú, katú, ku, kurí, kwera, maã, nti, nẽ, pawa, paá, presizu, pu, pá, ra, rakú, ranhẽ, raĩ, raẽ, rã, rẽ, será, supí, ta, taá, te, tenhẽ, tenki, tenupá, ti, té, tẽ, umbaá, wana, wera, wã, xukúi, yepé, ã
- This corpus contains 37 lemmas tagged as pronouns (PRON): aintá, aité, amú, awá, aé, i, indé, iné, ixé, ixéu, kwaá, kwá, manungara, maã, muiriira, mukũi-itá, muíri, ne, nhaã, panhẽ, pe, penhẽ, se, setá, siiya, siya, siía, ta, turusú, upaĩ, waá, xe, yandé, yané, yawé, yepé, yepé-yepé
- This corpus contains 20 lemmas tagged as determiners (DET): aité, amú, awá, aé, kwaá, kwá, maã, muíri, nhaã, panhẽ, setá, siiya, siya, siía, turusú, upanhẽ, upaĩ, yawé, yepé, yepé-yepé
- Out of the above, 19 lemmas occurred sometimes as PRON and sometimes as DET: aité, amú, awá, aé, kwaá, kwá, maã, muíri, nhaã, panhẽ, setá, siiya, siya, siía, turusú, upaĩ, yawé, yepé, yepé-yepé
- This corpus contains 7 lemmas tagged as auxiliaries (AUX): ikú, kari, kwáu, puderi, putari, sú, yuíri
- Out of the above, 5 lemmas occurred sometimes as AUX and sometimes as VERB: ikú, kwáu, putari, sú, yuíri
- There are 3 (de)verbal forms:
- Fin
- AUX: uikú, usú, asú, aikú, yasú, reikú, yaikú, upuderi, uputari, xaikú
- VERB: unheẽ, usú, usika, umaã, umunhã, upitá, urikú, upisika, umbeú, upurandú
- Inf
- AUX: putari, kwáu, kari, ikú
- VERB: mukaẽ, nheengari, rasú, suruka, yatiri, yumumeú, Meẽ, Munhã, kamunú, kataka
- Vnoun
- NOUN: manú
Nominal Features
- Plur
- AUX-Fin: yasú, yaikú, pesú, yapuderi, Pekũi, Pepuderi, peikú, taikú, tasú, yayuíri
- DET: kwá-itá, nhaã-itá, amú-itá
- NOUN: mira-itá, kunhã-itá, apigawa-itá, anama-itá, taína-itá, maã-itá, pirá-itá, kunhã-etá, pindá-itá, wirá-itá
- PRON: aintá, ta, yané, yandé, waá-itá, penhẽ, pe, amú-itá, kwá-itá, nhaã-itá
- VERB-Fin: yamunhã, yamaã, yasú, pemunhã, yaú, pemaã, taunheẽ, pesendú, pexari, peú
- Sing
- AUX-Fin: asú, aikú, reikú, xaikú, resú, xasú, aputari, Ekũi, Kũi, Repuderi
- DET: nhaã, kwá, amú, kwaá
- NOUN: ara, mira, manha, igara, pituna, kunhã, apigawa, paraná, yautí, kaá
- NOUN-Vnoun: manú
- PRON: i, se, waá, aé, ne, ixé, indé, nhaã, kwá, amú
- VERB-Fin: asú, reputari, remaã, rerikú, resú, amaã, amunhã, akwáu, arikú, remunhã
- Acc,Nom
- PRON: aé, yandé
- Dat
- PRON: ixéu
- Gen
- PRON: i, se, ne, aintá, yané, ta, pe, yandé, Xe, U
- Ind
- DET: yepé
- PRON: yepé
Degree and Polarity
- Aug
- NOUN: buyawasú, miráwasú, iwiwasú, kiririwasú, tiapuwasú, yawaratewasú-itá, amanawasú, awawasú, inayawasú, ipawawasú
- Cmp
- ADV: piri
- Dim
- ADJ: purangamirĩ
- NOUN: Abumirĩ, fardamirĩ, kurumirĩ, kurusamirĩ-etá, makakaí, wirawasumirĩ-etá
- PRON: setaíra
- Sup
- ADV: piri
- Neg
- PART: ti, intí, nẽ, te, nti, umbaá, tenhẽ, intíu
- Pos
- PART: eré, Eẽ
Verbal Features
- Compl
- PART: pawa, pá
- Freq
- ADV: Asuiwara, Ikewara, kwayewara, sewara, yawewara
- NOUN: arawara, rukawara
- PART: wera, aikwewara
- VERB-Fin: Amanduariwara, Asuwara
- Frus
- PART: yepé
- Hab
- SCONJ: rametiwa
- VERB-Fin: ambautiwa, ukanhemutiwa, upinaitikatiwa, upurungitatiwa, usutiwa, uyukatiwa
- Imp
- PART: rẽ, raĩ, raẽ, ranhẽ
- Iter
- AUX-Fin: ayuíri, yayuíri
- Perf
- PART: ana, ã, wã, wana
- Imp
- AUX-Fin: Pekũi, Ekũi, Kũi, pesú
- VERB-Fin: Iruri, yuri, remaã, ikũi, pemunhã, retirika, Epirari, Epurú, Pemaã, eruri
- Imp,Ind
- AUX-Fin: reikú, resú, pesú, Pepuderi
- VERB-Fin: Remaã, remunhã, remundú, rerikú, pemaã, rembeú, reruri, Retirika, pemunhã, pemuturusú
- Ind
- AUX-Fin: uikú, usú, asú, aikú, yasú, yaikú, upuderi, uputari, reikú, xaikú
- VERB-Fin: unheẽ, usú, usika, umaã, umunhã, upitá, urikú, upisika, umbeú, upurandú
- Fut
- PART: kurí, arama, arã, ku, rã
- Past
- PART: kwera, wera
- Pres
- ADV: Asuiwara, Ikewara, kwayewara, sewara, yawewara
- NOUN: arawara, rukawara
- PART: aikwewara
- VERB-Fin: Amanduariwara, Asuwara
- Mid,Pass
- VERB-Fin: uyumunhã, Uyupurungitá, reyumumeú, uyumuapiri, uyumuaíwa, uyumusangawa, uyumusarái, uyumutawarí, uyumuyuka, uyuputari
- VERB-Inf: yumumeú, yumunhã
- Nfh
- PART: paá
Pronouns, Determiners, Quantifiers
- Art
- DET: yepé
- PRON: yepé
- Dem
- ADV: iké, ape, kwá, akití, aape, Mimi, kí, Ikewara
- DET: nhaã, kwá, kwá-itá, kwaá, aé, nhaã-itá
- PRON: nhaã, kwá, kwá-itá, nhaã-itá, kwaá, aé
- Emp
- DET: aité
- PRON: aité
- Ind
- ADV: mairamé, makití, masuí, marupí
- DET: amú, siiya, siía, muíri, maã, setá, yawé, turusú, amú-itá, siya
- PRON: maã, awá, amú, amú-itá, manungara, siiya, siya, mukũi-itá, setá, amú-etá
- Int
- ADV: mayé, mamé, makití, marupí, mairamé, marama, masuí, maita, mayawé, marã
- DET: Maã, muíri, awá
- PRON: maã, awá, Muíri
- Prs
- PRON: i, aintá, se, aé, ne, ixé, indé, ta, yané, yandé
- Rel
- ADV: mamé, makití, mayé, mairamé, masuí, marupí
- PRON: waá, waá-itá, awá, maã
- Tot
- DET: panhẽ, upaĩ, muíri, upanhẽ
- PRON: panhẽ, muíri, upaĩ
- Card
- NUM: mukũi, musapiri, yepé, sete, 1930, Oito, nove, oitu, pú-mukũi
- Ord
- ADV: mukũisawa, primeru
- Yes
- PRON: se, i, ne, yané, aintá, ta, pe, yandé, Xe
- 1
- AUX-Fin: asú, aikú, yasú, yaikú, xaikú, xasú, yapuderi, aputari, apuderi, ayuíri
- PRON: se, ixé, yané, yandé, ixéu, Xe
- VERB-Fin: asú, yamunhã, amaã, amunhã, akwáu, arikú, aputari, yamaã, yasú, ambeú
- 2
- AUX-Fin: reikú, pesú, resú, Pekũi, Ekũi, Kũi, Pepuderi, Repuderi, peikú, rekwáu
- PRON: ne, indé, penhẽ, pe, iné
- VERB-Fin: reputari, remaã, rerikú, resú, remunhã, pemunhã, reyuri, pemaã, remaú, rembeú
- 3
- AUX-Fin: uikú, usú, upuderi, uputari, taikú, tasú
- PRON: i, aintá, aé, ta, U
- VERB-Fin: unheẽ, usú, usika, umaã, umunhã, upitá, urikú, upisika, umbeú, upurandú
- Sing
- NOUN: sera, suka, ximirikú, taíra, sawa, sesá, sukwera, sumuara, ximiára, sakakwera
Other Features
- AdpType
- Post
- ADP: upé, kití, irumu, suí, rupí, supé, arama, xupé, yawé, ramé
- Prep
- ADP: até, té
- Post
- AdvType
- Cau
- ADV: nhaãsé, Ape, aresé, aramé, marama, kurumú, marã
- Con
- ADV: Ma, nuká
- Deg
- ADV: reté, katú, xinga, retana, piri, yuíri, mirĩ, turusú, retã, puru
- Loc
- ADV: iké, apekatú, mamé, ape, makití, marupí, masuí, akití, arupí, kwá
- Man
- ADV: yawé, mayé, puranga, kwayé, kutara, kirimbawa, puxí, merupí, katú, tiapú
- Mod
- ADV: kuité, kuté
- Tim
- ADV: asuí, kuíri, ape, aramé, ariré, wirandé, yeperesé, aiwana, aape, kuxiima
- Cau
- Clitic
- Yes
- ADP: upé, pe, me, wara, arã
- ADV: ntu
- PART: taá, wera, ta
- Yes
- Compound
- Yes
- AUX-Inf: putari, kwáu, kari
- Yes
- Deixis
- Prox
- ADV: iké, kwá, kí
- DET: kwá, kwá-itá, kwaá
- PRON: kwá, kwá-itá, kwaá
- Remt
- ADV: ape, akití, aape, Mimi
- DET: nhaã, aé, nhaã-itá
- PRON: nhaã, nhaã-itá, aé
- Prox
- Derivation
- Coll
- NOUN: itatiwa, kapĩtiwa, mirawasutiwa, sakaitiwa, wakutiwa
- Priv
- ADJ: Adana-ima, apisaíma, ara-ima, kiinha-ima, paya-ima, santaíma, sawa-ima, tĩ-ima, ximirikú-ima
- ADV: tiapuíma
- VERB-Fin: kiaíma
- Coll
- ExtPos
- ADV
- ADV: Kutara
- PART: aikwé
- DET
- ADV: mayé
- PRON
- PART: nẽ
- SCONJ
- PRON: waá
- ADV
- Foc
- Yes
- PART: tẽ, tenhẽ, katú, ra, té
- Yes
- Modality
- Cond
- PART: maã
- Proh
- PART: te, tenhẽ
- Cond
- Number[grnd]
- Sing
- ADP: sesé, suakí, sesewara, sakakwera, suaxara
- Sing
- PartType
- Emp
- PART: tẽ, tenhẽ, katú, ra, té
- Exs
- PART: aikwé, aikwewara
- Int
- PART: taá, será, ta
- Mod
- PART: paá, pu, maã, supí, eré, te, tenki, tenupá, ipú, presizu
- Neg
- PART: ti, intí, nẽ, nti, umbaá, intíu
- Prs
- PART: xukúi, Kusukúi
- Emp
- Person[grnd]
- 3
- ADP: sesé, suakí, sesewara, sakakwera, suaxara
- 3
- Person[psor]
- 3
- NOUN: sera, suka, ximirikú, taíra, sawa, sesá, sukwera, sumuara, ximiára, sakakwera
- 3
- PunctType
- Elip
- PUNCT: [...]
- Elip
- Red
- Yes
- ADJ: purapuranga
- DET: yawé-yawé
- NOUN: tapurú-tapurú
- PRON: yawé-yawé
- VERB-Fin: uyawiyawika, Akaá-kaá, Tasuú-suú, Utuká-tuká, aganaganari, atuká-tuká, takaú-kaú, ukaúkaú, ukikiri, upinú-pinú
- Yes
- Rel
- Abs
- NOUN: uka, tatá, ukara, tetama, timbiú, ukena, tuixawa, tendawa, teapú, tuwí
- Cont
- ADP: resé, resewara, ruakí, rakakwera, aresé, rakwera, renundé, ruaxara, rikuyara
- NOUN: ruka, ramunha, raíra, retama, rapé, rupitá, rangawa, riiya, resá, rimirikú
- SCONJ: resewara
- VERB-Fin: rurí, ranhẽ, rakú, rawa, resarái, rikwé
- VERB-Inf: renúi, ripiaka
- NCont
- ADP: sesé, suakí, sesewara, sakakwera, suaxara
- NOUN: sera, suka, ximirikú, taíra, sawa, sesá, sukwera, sumuara, ximiára, sakakwera
- VERB-Fin: surí, sakú, sasí, sikwé, setá, tiapú, Ikupukú, sesaíma, tipí, sawa
- Abs
- Style
- Arch
- ADP: aresé, resewara
- AUX-Fin: xaikú, xasú
- AUX-Inf: ikú
- NOUN: ukena, imirikú, manú, rangawa, sakapira, siiya, sikú, tuwí, uka, ukara
- NOUN-Vnoun: manú
- PRON: yandé, ne, aé, se, i, penhẽ
- VERB-Fin: xasú, xarasú, xarikú, Ururi, Uxipiá, Xakitika, Xapiiri, Xaputari, Xasaisú, Xasika
- VERB-Inf: rasú, yumumeú, Meẽ, Munhã, kamunú, kataka, kutuka, muapisí, mundú, puitá
- Rare
- NOUN: Yukasara, teapú
- VERB-Inf: maramunhã
- Arch
- Typo
- Yes
- ADP: aresé, pu, py, rumu
- NOUN: Mukura, kaziwera, kunhaitãi, miarerú, remiré
- PART: maã
- PRON: U, i
- VERB-Fin: pasú, poréi, repi, ta, uarasú, uimú, uyupi, wasemu
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: ikú.
- This corpus uses 7 lemmas as auxiliaries (aux). Examples: sú, ikú, putari, kwáu, puderi, kari, yuíri.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (657)
- VERB-Fin--PRON (625)
- VERB-Fin--PRON-Gen (51)
- VERB-Inf--NOUN (7)
- VERB-Inf--PRON (8)
- VERB-Inf--PRON-Gen (1)
- obj
- VERB-Fin--NOUN (729)
- VERB-Fin--NOUN-ADP(resé) (3)
- VERB-Fin--PRON (347)
- VERB-Fin--PRON-Gen (1)
- VERB-Fin--PRON-Gen-ADP(irumu) (1)
- VERB-Inf--NOUN (2)
- VERB-Inf--PRON-Gen (6)
- iobj
- VERB-Fin--NOUN (3)
- VERB-Fin--NOUN-ADP(supé) (32)
- VERB-Fin--NOUN-ADP(xupé) (5)
- VERB-Fin--NOUN-ADP(xupé)-ADP(arama) (2)
- VERB-Fin--PRON (11)
- VERB-Fin--PRON-ADP(arama) (18)
- VERB-Fin--PRON-ADP(arã) (13)
- VERB-Fin--PRON-ADP(supé) (1)
- VERB-Fin--PRON-ADP(supé)-ADP(arama) (1)
- VERB-Fin--PRON-Dat (3)
- VERB-Fin--PRON-Gen-ADP(supé) (15)
- VERB-Fin--PRON-Gen-ADP(xupé) (36)
- VERB-Fin--PRON-Gen-ADP(xupé)-ADP(arã) (1)
- VERB-Inf--PRON-ADP(supé) (1)