UD Romanian MolDoRo
Language: Romanian (code: ro)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.17 release.
The following people have contributed to making this treebank part of UD: Olesea Caftanatov, Atul Kr. Ojha.
Repository: UD_Romanian-MolDoRo
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Romanian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [olesea • caftanatov (æt) math • md]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | not available |
| Relations | annotated manually, natively in UD style |
Description
A small treebank of sentences in Moldovan Romanian, using the Cyrillic writing system (as used in Moldova until 1989).
…
Acknowledgments
…
References
- (citation)
Statistics of UD Romanian MolDoRo
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – PART – PRON – PUNCT – SCONJ – VERB
Features
Relations
acl – advcl – advmod – amod – case – cc – conj – cop – det – expl:pv – fixed – iobj – mark – nmod – nsubj – obj – obl – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 30 sentences, 239 tokens and 241 syntactic words.
- This corpus contains 75 tokens (31%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 3 types of words that contain both letters and punctuation. Examples: -й, Лас', н'
- This corpus contains 2 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 2 types of multi-word tokens. Examples: Лас’сэ, н’ау.
Morphology
Tags
- This corpus uses 12 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, PART, PRON, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: PROPN, NUM, INTJ, SYM, X
- This corpus contains 4 word types tagged as particles (PART): А, н', ну, сэ
- This corpus contains 4 lemmas tagged as pronouns (PRON): ел, каре, нимень, сине
- This corpus contains 4 lemmas tagged as determiners (DET): мулт, орьче, сэу, ун
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): фи
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: фи
- This corpus does not use the VerbForm feature.
Nominal Features
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: фи.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (20)
- VERB--NOUN-ADP(де) (1)
- VERB--NOUN-ADP(дупэ)-ADP(ку) (1)
- VERB--PRON (2)
- obj
- VERB--NOUN (7)
- VERB--NOUN-ADP(де) (1)
- VERB--NOUN-ADP(ку) (1)
Reflexive Verbs
- This corpus contains 6 lemmas that occur at least once with an expl:pv child. Examples: фаче се, авя ышь, адуна се, муя се, теме се, ымфла се
Relations Overview
- This corpus uses 1 relation subtypes: expl:pv
- The following 1 main types are not used alone, they are always subtyped: expl
- The following 16 relation types are not used in this corpus at all: csubj, ccomp, vocative, dislocated, discourse, aux, appos, nummod, clf, flat, compound, list, orphan, goeswith, reparandum, dep