home edit page issue tracker

This page pertains to UD version 2.

Tokenization

The low-level tokenization of the Eastern Armenian UD treebank follows the tokenization of the ՀայՇտեմ - Eastern Armenian Dependency Treebank 1.0 (ArmDT-East):

Multi-word tokens

See above, the “infixed” punctuation.

Pronouns and adverbs

Verb forms, analytical grammatical forms, negation

Sentence splitting

Each sentence contains only one root. Splitting is usually performed after an end-of-sentence full stop or after a dot, ellipsis or colon when these punctuation marks separate unrelated subparts of a sentence. Items in a list may sometimes be rendered as separate sentences.