home edit page issue tracker

This page pertains to UD version 2.

UD Romanian RRT

Language: Romanian (code: ro)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v1.2 release.

The following people have contributed to making this treebank part of UD: Verginica Barbu Mititelu, Elena Irimia, Cenel-Augusto Perez, Radu Ion, Radu Simionescu, Martin Popel.

Repository: UD_Romanian-RRT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.6

License: CC BY-SA 4.0

Genre: wiki, legal, news, fiction, medical, nonfiction, academic

Questions, comments? General annotation questions (either Romanian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [vergi (æt) racai • ro]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas assigned by a program, not checked manually
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS assigned by a program, not checked manually
Features annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
Relations annotated manually, natively in UD style

Description

The Romanian UD treebank (called RoRefTrees) (Barbu Mititelu et al., 2016) is the reference treebank in UD format for standard Romanian.

It is based on RACAI-RoTb (Irimia and Barbu Mititelu, 2015) and on UAIC-RoTb (Perez, 2014). The distribution of text genres in RoRefTrees is not balanced: literature - 1818 sentences, law - 1606 sentences, medical - 1210 sentences, FrameNet translations - 1092 sentences, academic writing - 950 sentences, news - 933 sentences, science - 362 sentences, wikipedia - 251 sentences, miscellanea - 1301 sentences. These genres can be told apart by the sentences id.

Acknowledgments

This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS - UEFISCDI, project number PN-II-RU-TE-2014-4-1362.

Statistics of UD Romanian RRT

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrAdpTypeCaseDefiniteDegreeForeignGenderMoodNumberNumber[psor]NumFormNumTypePartTypePersonPolarityPositionPossPronTypeReflexStrengthTenseVariantVerbForm

Relations

acladvcladvcl:tcladvmodadvmod:tmodamodapposauxaux:passcasecccc:preconjccompccomp:pmodcompoundconjcopcsubjcsubj:passdepdetdiscourseexplexpl:impersexpl:passexpl:possexpl:pvfixedflatgoeswithiobjlistmarknmodnmod:agentnmod:pmodnmod:tmodnsubjnsubj:passnummodobjoblorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Reflexive Passive

Verbs with Reflexive Core Objects

Relations Overview