home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Romanian-RRT: POS Tags: NUM

There are 933 NUM lemmas (5%), 980 NUM types (3%) and 5549 NUM tokens (3%). Out of 16 observed tags, the rank of NUM is: 5 in number of lemmas, 5 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: doi, 1, 2, prim, 3, trei, 4, unu, 5, 6

The 10 most frequent NUM types: 1, 2, 3, două, 4, trei, 5, 6, primul, doi

The 10 most frequent ambiguous lemmas: prim (NUM 248, ADJ 2), întâi (NUM 15, ADV 13), dintâi (NUM 10, ADV 2), zero (NUM 10, NOUN 1), X (NOUN 18, NUM 3), _ (X 82, NUM 2, PUNCT 1), i (NOUN 3, NUM 1), xi (NOUN 1, NUM 1), 5a (ADV 3, X 2, NUM 1, PROPN 1), V. (NOUN 3, NUM 1)

The 10 most frequent ambiguous types: 2 (NUM 281, X 2), 3 (NUM 203, X 1), i (PRON 104, NOUN 3, NUM 1), 9 (NUM 42, X 1), primele (NUM 30, NOUN 1), o (DET 1814, PRON 187, NUM 27, PART 9, AUX 7, ADV 1), 0 (NUM 22, X 2), 100 (NUM 22, X 3), un (DET 1610, NUM 16, X 2), V (NOUN 12, NUM 10)

Morphology

The form / lemma ratio of NUM is 1.050375 (the average of all parts of speech is 1.814756).

The 1st highest number of forms (12) was observed with the lemma “prim”: prim, prim-, prima, prime, primei, primele, primelor, primii, primilor, primul, primului, primă.

The 2nd highest number of forms (10) was observed with the lemma “ultim”: ultim, ultima, ultime, ultimei, ultimele, ultimelor, ultimii, ultimilor, ultimul, ultimului.

The 3rd highest number of forms (6) was observed with the lemma “doi”: doi, doilea, doua, două, ii, secund.

NUM occurs with 9 features: NumType (5549; 100% instances), Number (5533; 100% instances), NumForm (5498; 99% instances), Gender (940; 17% instances), Case (495; 9% instances), Definite (470; 8% instances), Typo (49; 1% instances), PronType (48; 1% instances), Foreign (1; 0% instances)

NUM occurs with 17 feature-value pairs: Case=Acc,Nom, Case=Dat,Gen, Definite=Def, Definite=Ind, Foreign=Yes, Gender=Fem, Gender=Masc, NumForm=Combi, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Ord, Number=Plur, Number=Sing, PronType=Tot, Typo=Yes

NUM occurs with 44 feature combinations. The most frequent feature combination is Number=Sing|NumForm=Digit|NumType=Card (3535 tokens). Examples: 1, 2, 3, 4, 5, 6, 7, 8, 2004, 10

Relations

NUM nodes are attached to their parents using 20 different relations: nummod (4242; 76% instances), parataxis (751; 14% instances), conj (292; 5% instances), nsubj (73; 1% instances), compound (47; 1% instances), obj (29; 1% instances), fixed (24; 0% instances), root (22; 0% instances), appos (14; 0% instances), nsubj:pass (13; 0% instances), nmod (9; 0% instances), xcomp (8; 0% instances), advcl (5; 0% instances), obl (4; 0% instances), orphan (4; 0% instances), acl (3; 0% instances), csubj (3; 0% instances), flat (3; 0% instances), dep (2; 0% instances), iobj (1; 0% instances)

Parents of NUM nodes belong to 13 different parts of speech: NOUN (3745; 67% instances), VERB (1117; 20% instances), NUM (441; 8% instances), PROPN (81; 1% instances), ADJ (57; 1% instances), ADV (39; 1% instances), (22; 0% instances), ADP (18; 0% instances), PRON (17; 0% instances), X (5; 0% instances), AUX (4; 0% instances), DET (2; 0% instances), INTJ (1; 0% instances)

2875 (52%) NUM nodes are leaves.

1326 (24%) NUM nodes have one child.

1087 (20%) NUM nodes have two children.

261 (5%) NUM nodes have three or more children.

The highest child degree of a NUM node is 10.

Children of NUM nodes are attached using 23 different relations: punct (1914; 43% instances), case (879; 20% instances), det (392; 9% instances), conj (302; 7% instances), advmod (234; 5% instances), cc (167; 4% instances), nmod (145; 3% instances), nummod (112; 3% instances), compound (61; 1% instances), goeswith (61; 1% instances), cop (30; 1% instances), nsubj (28; 1% instances), appos (25; 1% instances), acl (22; 0% instances), amod (17; 0% instances), parataxis (9; 0% instances), mark (8; 0% instances), advcl (6; 0% instances), aux (6; 0% instances), dep (6; 0% instances), fixed (4; 0% instances), obl:pmod (2; 0% instances), flat (1; 0% instances)

Children of NUM nodes belong to 15 different parts of speech: PUNCT (1914; 43% instances), ADP (881; 20% instances), NUM (441; 10% instances), DET (433; 10% instances), NOUN (196; 4% instances), ADV (191; 4% instances), CCONJ (178; 4% instances), X (63; 1% instances), AUX (36; 1% instances), VERB (31; 1% instances), PRON (24; 1% instances), ADJ (18; 0% instances), SCONJ (13; 0% instances), PROPN (10; 0% instances), PART (2; 0% instances)