UD for Chukchi

Tokenisation and Word Segmentation

Tokens in the corpus are defined as sequences of non-whitespace or punctuation characters separated by whitespace.

Underneath tokens are syntactic words. These are words which take part in dependency relations.

A single token may be split into more than one word based on the following criteria:

Incorporation into lexical verbs does not involve splitting in the basic representation, sentences are annotated according to their surface syntax, e.g. a verb with an incorporated object is annotated as intransitive following its agreement morphology.

In the enhanced representation, the argument structure is recovered, incorporated items are given nodes.



The following are described by Dunn as particles, but are annotated here as AUX given the TAM.

The following are copulae:




