UD for Karo
Tokenization and Word Segmentation
- Words are delimited by whitespace characters
- According to typographical rules, many punctuation marks are attached to a neighboring word. They are given as separate tokens (words);
Morphology
Tags
- Karo uses all 18 universal POS categories.
Mapping UPOS to XPOS Karo
UPOS | XPOS |
---|---|
ADJ | adj |
ADV | adv |
INTJ | intj |
NOUN | n |
PROPN | ppn |
VERB | v, vi, vt |
ADP | pp |
AUX | aux |
CCONJ | cc |
DET | det |
NUM | num |
PART | pcl |
PRON | pro |
SCONJ | sc |
PUNCT | punct |
SYM | sym |
X | x |
Nominal Features
- Karo nouns are not marked for gender. Number is optionally marked .
- Nous can take the following Cases:
?
,?
,?
,?
,?
. - PRON, are marked for Gender only in. .
- Personal Pronouns and Person Markers distinguish Number(Singular or Plural). They also distinguish Clusivity in the 1st person plural.
- The relational markers
Rel
, which indicate contiguity or non-contiguity between a head and its dependent, take respectively the following features:Rel=Cont
andRel=NCont
. A third type or relational indicates that a possessor is not present, neither contiguously or non-contiguously. This relational is taggedRel=Abs
, for absolute. The reflexive/correferential morpheme o. which is often referred to as ‘relational3’ is associated with the feature-valueReflex=Yes
. - Tupinambá is reach in nominalizations. Lexical roots can be nominalized by suffixes that receive the following features: nominalizatin of circusntance
Nomzr=Circ
(-saβ ‘thing, way of VERB’), passive nominalizationNomzr=Pas
, deverbal passive nominalizationNomzr=DevPass
(-pɨr ‘one that is VERB past participle’),Nomzr=Ag
(-sar ‘the VERB-_er_’). - Nouns may also be reduplicated in both ways denoting: plurality, collectivity, superlativity, and other semantic nuances. Numerals may also be reduplicated in order to indicate distribution.
- Nouns are also marked for tense.
- As an omnipredicative language, lexical roots in Tupinambá are existential predicates. In order to function as arguments, the referential marker (a ̴ ∅), is required (marked as
Case=Ref
) despite its function being nothing like that of nominal cases.
Verbal Features
- Verbs have a lexical Aspect: imperfective (Imp), perfective (Perf), iterative (Iter).
- As a head marking language, Tupinambá cross-references both arguments of a two-place predicate, S and O, only when O is third person: a-s-epjak 1.SG-3(RELNCONT)-see ‘I see him/her/it/them’. The PERSON feature in this case ha sthe value
Person=13
indicating that A is 1st and O 3rd person. - The protmanteau markers indicating 1 -> 2 (A is 1st and U is 2nd person) are asigned the PERSON feature
Person12Sg
andPerson12Pl
. - The lexical root in the gerund (VerbForm=Ger) is marked as VERB even when combining with a relational.
- Verbs are marked for aspect:
Compl
(completive),Iter
(Iterative),Suc
Successive. - Verbs are also marked for mood:
Perm
(permissive). - Lexical roots may be reduplicated in two differentways:
monosylabic reduplication (
Red=Mo
), disylabic reduplication (Red=Di
). The modify the aspect of the verb in different ways: disylabic reduplication indicate the repetition or duration of an action; monosylabic reduplication indicates iteration of the action. —
Syntax
There are three types of sentences in Karo according to their illocutionary force: declarative, interrogative, and imperative.
- Declarative sentences may as well contain adjuncts in their periphery (postpositional phrases or adverbials).
- Focused constituents are fronted.
- Interrogative illocutionary force can be of two types: yes-no questions and information questions.
Treebanks
There are N Karo UD treebanks:
Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and
from the data in the latest release. Link to the respective *-index.html
page in the treebanks
folder, using the language code
and the treebank code in the file name.