home edit page issue tracker

This page pertains to UD version 2.

Tokenization

The low-level tokenization of the UD Eastern Armenian Treebank follows the tokenization of the Հայերենի ծառադարան - Eastern Armenian Treebank:

Multi-word tokens

See above, the “infixed” punctuation.

Pronouns and adverbs

Verb forms, analytical grammatical forms, negation

Sentence splitting

Each sentence contains only one root. Splitting is usually performed after an end-of-sentence full stop or after a dot, ellipsis or colon when these punctuation marks separate unrelated subparts of a sentence. Items in a list may sometimes be rendered as separate sentences.