UD for Kaapor 
Tokenization and Word Segmentation
- In general, words are delimited by whitespace characters. Description of exceptions follows.
- According to typographical rules, many punctuation marks are attached to a neighboring word. We always tokenize them as separate tokens (words);
- There are no adjectives in Tupinambá. Modification is made by composition, so when a lexical root is modified by another a new word appears as in kuɲãporaŋ (kuɲã ‘woman’ + poraŋ ‘beauty’). Such words are treated as multiword tokens.
Morphology
- Tupinambá nouns are not marked for gender. Number is optionally marked.
- Nous can take the following Cases:
TraandLoc. There different locatives, which areasigned the following features:Case=LocPunc(punctual locative),CASE=LocDif(diffuse locative). - What has been traditionally called circunstantial mood or indicative II in some Tupí-Guaraní languages referes to the nominalization of a predicate and the fronting of an adverbial expression. The nominalized form of the verb takes
Nomz=Circas feature and value.
Tags
This is an overview only. For more detailed discussion and examples, see the list of Czech POS tags and Czech features.
- Tupinambá uses 16 of the 17 universal POS categories. ADJ is not used.
- The (de)verbal forms used, are: infinitive
Inf, finite verbFin, tagged, converbConv, gerundGer.
Mapping UPOS to XPOS Ka’apor
| UPOS | XPOS |
|---|---|
| ADV | adv |
| INTJ | intj |
| NOUN | n |
| PROPN | ppn |
| VERB | v, vi, vt |
| ADP | pp |
| AUX | aux |
| CCONJ | cc |
| DET | det |
| NUM | num |
| PART | pcl |
| PRON | pro |
| SCONJ | sc |
| PUNCT | punct |
| SYM | sym |
| X | x |
Features
- The relational markers
Rel, which indicate contiguity or non-contiguity between a head and its dependent, take respectively the following features:Rel=ContandRel=NCont. A third type or rletional indicates that a possessor is not present, neither contiguously or non-contiguously. This relational is taggedRel=Abs, for relational absolute. - As a head marking language, Tupinambá cross-references arguments on the predicate, mostly when the object is third person: a-s-epjak 1.SG-3-see ‘I see him’. The PERSON feature in this case will be
Person=33. - The protmanteau markers, 1 -> 2 are asigned the PERSON feature
Person12SgandPerson12Pl. - Tupinambá is reach in nominalizations. Lxical roots can be nominalized by suffixes that receive the following features: nominalizatin of circusntance
Nomzr=Circ(-saβ ‘thing, way of VERB’), deverbal passive nominalizationNomzr=DevPass(-pɨr ‘one that is VERB’).
Treebanks
There is 1 Kaapor UD treebank: