home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-Taiga: POS Tags: NUM

There are 542 NUM lemmas (1%), 645 NUM types (0%) and 12848 NUM tokens (1%). Out of 17 observed tags, the rank of NUM is: 7 in number of lemmas, 7 in number of types and 14 in number of tokens.

The 10 most frequent NUM lemmas: один, два, много, три, несколько, оба, 2, 1, 3, сколько

The 10 most frequent NUM types: два, много, несколько, три, один, двух, две, 2, 1, 3

The 10 most frequent ambiguous lemmas: один (DET 2678, NUM 1901), много (NUM 954, ADV 127), несколько (NUM 703, ADV 106), 2 (NUM 385, ADJ 28), 1 (NUM 332, ADJ 23), 3 (NUM 301, ADJ 20, PROPN 2), сколько (NUM 282, CCONJ 37, ADV 35, SCONJ 3), мало (NUM 208, ADV 101), 5 (NUM 199, ADJ 14), 4 (NUM 193, ADJ 14)

The 10 most frequent ambiguous types: много (NUM 635, ADV 85, X 2), несколько (NUM 493, ADV 102), один (DET 488, NUM 421), 2 (NUM 381, ADJ 29), 1 (NUM 332, ADJ 23), 3 (NUM 295, ADJ 20), одной (DET 308, NUM 289), сколько (NUM 180, CCONJ 37, ADV 27, SCONJ 3), одного (DET 242, NUM 226), одно (DET 260, NUM 193)

Morphology

The form / lemma ratio of NUM is 1.190037 (the average of all parts of speech is 2.706171).

The 1st highest number of forms (10) was observed with the lemma “один”: оден, один, одна, одним, одно, одного, одной, одном, одному, одну.

The 2nd highest number of forms (8) was observed with the lemma “оба”: оба, обе, обеим, обеими, обеих, обоим, обоими, обоих.

The 3rd highest number of forms (7) was observed with the lemma “два”: Дв-ва, два, две, двум, двумя, двух, д’ве.

NUM occurs with 10 features: NumForm (12848; 100% instances), NumType (12848; 100% instances), Case (9044; 70% instances), Gender (4144; 32% instances), Number (1901; 15% instances), Animacy (1223; 10% instances), Degree (252; 2% instances), Typo (7; 0% instances), Abbr (3; 0% instances), ExtPos (3; 0% instances)

NUM occurs with 25 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Degree=Cmp, ExtPos=NUM, ExtPos=PRON, Gender=Fem, Gender=Masc, Gender=Neut, NumForm=Combi, NumForm=Cyril, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, NumType=Sets, Number=Sing, Typo=Yes

NUM occurs with 107 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (3406 tokens). Examples: 2, 1, 3, 5, 4, 10, 6, 20, 7, 30

Relations

NUM nodes are attached to their parents using 31 different relations: nummod:gov (5335; 42% instances), nummod (3573; 28% instances), root (752; 6% instances), nmod (570; 4% instances), conj (553; 4% instances), parataxis (443; 3% instances), appos (399; 3% instances), nsubj (255; 2% instances), obj (242; 2% instances), compound (196; 2% instances), obl (117; 1% instances), obl:tmod (72; 1% instances), list (64; 0% instances), xcomp (62; 0% instances), advcl (41; 0% instances), ccomp (27; 0% instances), obl:pronmod (27; 0% instances), obl:float (17; 0% instances), orphan (17; 0% instances), flat (16; 0% instances), acl:relcl (13; 0% instances), nsubj:pass (11; 0% instances), acl (10; 0% instances), amod (10; 0% instances), iobj (8; 0% instances), csubj (6; 0% instances), fixed (4; 0% instances), dep (3; 0% instances), obl:depict (2; 0% instances), parataxis:discourse (2; 0% instances), flat:foreign (1; 0% instances)

Parents of NUM nodes belong to 16 different parts of speech: NOUN (9335; 73% instances), VERB (941; 7% instances), NUM (922; 7% instances), (752; 6% instances), ADJ (325; 3% instances), PRON (171; 1% instances), X (109; 1% instances), PROPN (97; 1% instances), SYM (83; 1% instances), DET (55; 0% instances), ADV (31; 0% instances), INTJ (14; 0% instances), AUX (4; 0% instances), PART (4; 0% instances), CCONJ (3; 0% instances), ADP (2; 0% instances)

8196 (64%) NUM nodes are leaves.

3149 (25%) NUM nodes have one child.

669 (5%) NUM nodes have two children.

834 (6%) NUM nodes have three or more children.

The highest child degree of a NUM node is 29.

Children of NUM nodes are attached using 38 different relations: punct (2640; 34% instances), advmod (1477; 19% instances), conj (556; 7% instances), nmod (543; 7% instances), nsubj (538; 7% instances), case (402; 5% instances), cc (282; 4% instances), obl (249; 3% instances), parataxis (214; 3% instances), compound (190; 2% instances), det (151; 2% instances), cop (101; 1% instances), mark (75; 1% instances), amod (69; 1% instances), advcl (64; 1% instances), iobj (49; 1% instances), list (40; 1% instances), parataxis:discourse (38; 0% instances), orphan (32; 0% instances), appos (25; 0% instances), obl:tmod (24; 0% instances), flat (21; 0% instances), aux (13; 0% instances), nummod:gov (13; 0% instances), acl (11; 0% instances), discourse (11; 0% instances), acl:relcl (8; 0% instances), obl:pronmod (8; 0% instances), expl (7; 0% instances), vocative (6; 0% instances), nummod (5; 0% instances), flat:foreign (4; 0% instances), fixed (3; 0% instances), ccomp (1; 0% instances), dep (1; 0% instances), flat:name (1; 0% instances), goeswith (1; 0% instances), obj (1; 0% instances)

Children of NUM nodes belong to 17 different parts of speech: PUNCT (2640; 34% instances), NUM (922; 12% instances), ADV (912; 12% instances), NOUN (905; 11% instances), PART (644; 8% instances), ADP (368; 5% instances), CCONJ (280; 4% instances), ADJ (239; 3% instances), VERB (236; 3% instances), DET (192; 2% instances), PRON (186; 2% instances), AUX (114; 1% instances), SYM (87; 1% instances), SCONJ (78; 1% instances), PROPN (36; 0% instances), X (28; 0% instances), INTJ (7; 0% instances)