home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-GSD: POS Tags: NUM

There are 683 NUM lemmas (4%), 728 NUM types (2%) and 2031 NUM tokens (2%). Out of 16 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 9 in number of tokens.

The 10 most frequent NUM lemmas: ОДИН, ДВА, НЕСКОЛЬКО, ТРИ, 2, 1, 10, ЧЕТЫРЕ, 4, 3

The 10 most frequent NUM types: 2, два, один, несколько, 1, двух, 10, 4, три, 3

The 10 most frequent ambiguous lemmas: ОДИН (NUM 185, ADV 1), НЕСКОЛЬКО (NUM 68, ADV 5), ТРИ (NUM 58, ADV 1), 2 (NUM 56, ADV 22, ADJ 9), 1 (NUM 43, ADJ 33, ADV 19), 10 (NUM 40, ADJ 14, ADV 8), 4 (NUM 35, ADJ 14, ADV 13), 3 (NUM 31, ADV 13, ADJ 8), 5 (NUM 29, ADJ 9, ADV 5), МНОГО (NUM 29, ADV 9)

The 10 most frequent ambiguous types: 2 (NUM 56, ADV 22, ADJ 9), один (NUM 42, ADV 1), несколько (NUM 41, ADV 5), 1 (NUM 43, ADJ 33, ADV 19), 10 (NUM 40, ADJ 14, ADV 8), 4 (NUM 35, ADJ 14, ADV 13), три (NUM 29, ADV 1), 3 (NUM 30, ADV 13, ADJ 8), 5 (NUM 29, ADJ 9, ADV 5), 20 (NUM 24, ADJ 12, ADV 11)

Morphology

The form / lemma ratio of NUM is 1.065886 (the average of all parts of speech is 1.592402).

The 1st highest number of forms (10) was observed with the lemma “ОДИН”: один, одна, одним, одних, одно, одного, одной, одном, одному, одну.

The 2nd highest number of forms (5) was observed with the lemma “ДВА”: два, две, двум, двумя, двух.

The 3rd highest number of forms (5) was observed with the lemma “МНОГО”: более, больше, многим, многих, много.

NUM occurs with 6 features: NumType (2028; 100% instances), Case (2027; 100% instances), Animacy (1013; 50% instances), Gender (601; 30% instances), Number (316; 16% instances), Degree (2; 0% instances)

NUM occurs with 15 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Degree=Cmp, Gender=Fem, Gender=Masc, Gender=Neut, NumType=Card, Number=Plur, Number=Sing

NUM occurs with 84 feature combinations. The most frequent feature combination is Case=Nom|NumType=Card (470 tokens). Examples: 10, 5, 0, 16, 15, 20, 6, 11, 12, 13

Relations

NUM nodes are attached to their parents using 21 different relations: nummod:gov (855; 42% instances), nummod (641; 32% instances), appos (73; 4% instances), nmod (68; 3% instances), root (66; 3% instances), conj (60; 3% instances), compound (50; 2% instances), obl (43; 2% instances), list (28; 1% instances), amod (26; 1% instances), obj (25; 1% instances), nsubj (22; 1% instances), goeswith (20; 1% instances), parataxis (19; 1% instances), xcomp (13; 1% instances), advmod (11; 1% instances), nsubj:pass (5; 0% instances), iobj (2; 0% instances), orphan (2; 0% instances), acl (1; 0% instances), ccomp (1; 0% instances)

Parents of NUM nodes belong to 12 different parts of speech: NOUN (1530; 75% instances), NUM (113; 6% instances), VERB (113; 6% instances), SYM (83; 4% instances), (66; 3% instances), PROPN (61; 3% instances), ADJ (30; 1% instances), ADV (17; 1% instances), AUX (11; 1% instances), PRON (3; 0% instances), ADP (2; 0% instances), PUNCT (2; 0% instances)

1523 (75%) NUM nodes are leaves.

305 (15%) NUM nodes have one child.

84 (4%) NUM nodes have two children.

119 (6%) NUM nodes have three or more children.

The highest child degree of a NUM node is 7.

Children of NUM nodes are attached using 23 different relations: punct (303; 33% instances), nmod (173; 19% instances), case (100; 11% instances), advmod (76; 8% instances), nsubj (67; 7% instances), conj (50; 6% instances), cc (31; 3% instances), list (19; 2% instances), goeswith (15; 2% instances), appos (14; 2% instances), discourse (12; 1% instances), orphan (9; 1% instances), parataxis (9; 1% instances), cop (8; 1% instances), nummod (6; 1% instances), amod (4; 0% instances), compound (4; 0% instances), dep (2; 0% instances), det (2; 0% instances), acl (1; 0% instances), advcl (1; 0% instances), nummod:gov (1; 0% instances), obl (1; 0% instances)

Children of NUM nodes belong to 14 different parts of speech: PUNCT (329; 36% instances), NOUN (207; 23% instances), NUM (113; 12% instances), ADP (75; 8% instances), ADV (62; 7% instances), CCONJ (28; 3% instances), PART (28; 3% instances), SYM (13; 1% instances), PROPN (11; 1% instances), PRON (10; 1% instances), ADJ (9; 1% instances), VERB (9; 1% instances), AUX (8; 1% instances), DET (6; 1% instances)