home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-Taiga: POS Tags: NUM

There are 376 NUM lemmas (2%), 439 NUM types (1%) and 3085 NUM tokens (2%). Out of 17 observed tags, the rank of NUM is: 7 in number of lemmas, 7 in number of types and 13 in number of tokens.

The 10 most frequent NUM lemmas: много, 2, один, 3, два, 1, несколько, 5, 4, сколько

The 10 most frequent NUM types: много, 2, 3, 1, 5, несколько, 4, два, сколько, один

The 10 most frequent ambiguous lemmas: много (NUM 285, ADV 18, X 1), 2 (NUM 221, ADJ 15), один (DET 236, NUM 194), 3 (NUM 161, ADJ 10), 1 (NUM 145, ADJ 14), несколько (NUM 121, ADV 5), 5 (NUM 119, ADJ 7), 4 (NUM 104, ADJ 7), сколько (NUM 80, ADV 3, CCONJ 2), мало (NUM 66, ADV 20)

The 10 most frequent ambiguous types: много (NUM 181, ADV 15, X 1), 2 (NUM 217, ADJ 16), 3 (NUM 155, ADJ 10), 1 (NUM 145, ADJ 14), 5 (NUM 118, ADJ 7), несколько (NUM 89, ADV 4), 4 (NUM 104, ADJ 7), сколько (NUM 45, CCONJ 2, ADV 1), один (NUM 64, DET 57), 10 (NUM 64, ADJ 4)

Morphology

The form / lemma ratio of NUM is 1.167553 (the average of all parts of speech is 1.875784).

The 1st highest number of forms (10) was observed with the lemma “один”: оден, один, одна, одним, одно, одного, одной, одном, одному, одну.

The 2nd highest number of forms (6) was observed with the lemma “несколько”: неск, неск., несколькими, нескольких, несколько, нескольку.

The 3rd highest number of forms (6) was observed with the lemma “оба”: оба, обе, обеим, обеих, обоим, обоих.

NUM occurs with 9 features: NumForm (3082; 100% instances), NumType (2990; 97% instances), Case (1151; 37% instances), Gender (358; 12% instances), Number (194; 6% instances), Animacy (177; 6% instances), Degree (57; 2% instances), Abbr (3; 0% instances), Typo (3; 0% instances)

NUM occurs with 22 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Degree=Cmp, Gender=Fem, Gender=Masc, Gender=Neut, NumForm=Combi, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, NumType=Sets, Number=Sing, Typo=Yes

NUM occurs with 87 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (1682 tokens). Examples: 2, 3, 1, 5, 4, 10, 7, 30, 6, 20

Relations

NUM nodes are attached to their parents using 26 different relations: nummod:gov (1370; 44% instances), nummod (546; 18% instances), root (279; 9% instances), nmod (221; 7% instances), conj (147; 5% instances), parataxis (144; 5% instances), appos (109; 4% instances), obl (77; 2% instances), obj (48; 2% instances), nsubj (42; 1% instances), list (16; 1% instances), compound (15; 0% instances), flat (15; 0% instances), advcl (13; 0% instances), xcomp (11; 0% instances), ccomp (7; 0% instances), acl (6; 0% instances), acl:relcl (6; 0% instances), csubj (3; 0% instances), dep (2; 0% instances), nsubj:pass (2; 0% instances), orphan (2; 0% instances), amod (1; 0% instances), goeswith (1; 0% instances), iobj (1; 0% instances), mark (1; 0% instances)

Parents of NUM nodes belong to 15 different parts of speech: NOUN (2031; 66% instances), (279; 9% instances), VERB (275; 9% instances), NUM (260; 8% instances), ADJ (69; 2% instances), SYM (47; 2% instances), X (47; 2% instances), PROPN (34; 1% instances), PRON (23; 1% instances), ADV (6; 0% instances), AUX (4; 0% instances), CCONJ (3; 0% instances), INTJ (3; 0% instances), DET (2; 0% instances), PART (2; 0% instances)

1798 (58%) NUM nodes are leaves.

812 (26%) NUM nodes have one child.

216 (7%) NUM nodes have two children.

259 (8%) NUM nodes have three or more children.

The highest child degree of a NUM node is 19.

Children of NUM nodes are attached using 31 different relations: punct (755; 34% instances), advmod (280; 12% instances), nsubj (240; 11% instances), nmod (201; 9% instances), case (164; 7% instances), conj (161; 7% instances), parataxis (94; 4% instances), obl (68; 3% instances), cc (62; 3% instances), iobj (27; 1% instances), mark (25; 1% instances), amod (21; 1% instances), compound (16; 1% instances), cop (16; 1% instances), det (16; 1% instances), flat (14; 1% instances), orphan (12; 1% instances), advcl (11; 0% instances), aux (11; 0% instances), fixed (8; 0% instances), list (8; 0% instances), appos (7; 0% instances), discourse (6; 0% instances), acl:relcl (5; 0% instances), expl (4; 0% instances), flat:foreign (4; 0% instances), nummod (3; 0% instances), nummod:gov (3; 0% instances), acl (2; 0% instances), dep (1; 0% instances), flat:name (1; 0% instances)

Children of NUM nodes belong to 17 different parts of speech: PUNCT (755; 34% instances), NOUN (382; 17% instances), NUM (260; 12% instances), ADV (219; 10% instances), ADP (153; 7% instances), PART (81; 4% instances), VERB (70; 3% instances), ADJ (63; 3% instances), CCONJ (61; 3% instances), PRON (59; 3% instances), SYM (43; 2% instances), AUX (27; 1% instances), DET (25; 1% instances), SCONJ (22; 1% instances), X (14; 1% instances), PROPN (9; 0% instances), INTJ (3; 0% instances)