home edit page issue tracker

This page pertains to UD version 2.

UD for Russian

Tokenization and Word Segmentation

Russian UD treebanks do not contain multiword tokens.

Morphology

Tags

All corpora use the full range of UPOS tags. The XPOS column uses a version of the Penn Treebank tagset in GSD and Taiga treebanks, see https://github.com/olesar/ruUD/blob/master/conversion/RussianUD_XPOSlist.md.

Features

Morphological features are included in all corpora. In GSD and Taiga, they are tagged manually, in Syntagrus, they are converted from the features manually tagged in the source treebank. In PUD, they are added automatically and then manually checked.

The following feature subtypes are used in Russian:

The following universal features are not used in Russian: Clusivity, Definite, Evident, NounClass, Polite.

Syntax

Core Arguments, Oblique Arguments and Adjuncts

Copula Clauses

Expletives

Adjectival Clauses

Nominal phrases

Function words

Other relations

Relations Overview

Treebanks

There are four Russian UD treebanks: