Treebank Statistics: UD_Russian-Taiga: POS Tags: NUM
There are 542 NUM lemmas (1%), 645 NUM types (0%) and 12848 NUM tokens (1%).
Out of 17 observed tags, the rank of NUM is: 7 in number of lemmas, 7 in number of types and 14 in number of tokens.
The 10 most frequent NUM lemmas: один, два, много, три, несколько, оба, 2, 1, 3, сколько
The 10 most frequent NUM types: два, много, несколько, три, один, двух, две, 2, 1, 3
The 10 most frequent ambiguous lemmas: один (DET 2678, NUM 1901), много (NUM 954, ADV 127), несколько (NUM 703, ADV 106), 2 (NUM 385, ADJ 28), 1 (NUM 332, ADJ 23), 3 (NUM 301, ADJ 20, PROPN 2), сколько (NUM 282, CCONJ 37, ADV 35, SCONJ 3), мало (NUM 208, ADV 101), 5 (NUM 199, ADJ 14), 4 (NUM 193, ADJ 14)
The 10 most frequent ambiguous types: много (NUM 635, ADV 85, X 2), несколько (NUM 493, ADV 102), один (DET 488, NUM 421), 2 (NUM 381, ADJ 29), 1 (NUM 332, ADJ 23), 3 (NUM 295, ADJ 20), одной (DET 308, NUM 289), сколько (NUM 180, CCONJ 37, ADV 27, SCONJ 3), одного (DET 242, NUM 226), одно (DET 260, NUM 193)
- много
- несколько
- один
- 2
- 1
- 3
- одной
- сколько
- одного
- одно
Morphology
The form / lemma ratio of NUM is 1.190037 (the average of all parts of speech is 2.706171).
The 1st highest number of forms (10) was observed with the lemma “один”: оден, один, одна, одним, одно, одного, одной, одном, одному, одну.
The 2nd highest number of forms (8) was observed with the lemma “оба”: оба, обе, обеим, обеими, обеих, обоим, обоими, обоих.
The 3rd highest number of forms (7) was observed with the lemma “два”: Дв-ва, два, две, двум, двумя, двух, д’ве.
NUM occurs with 10 features: NumForm (12848; 100% instances), NumType (12848; 100% instances), Case (9044; 70% instances), Gender (4144; 32% instances), Number (1901; 15% instances), Animacy (1223; 10% instances), Degree (252; 2% instances), Typo (7; 0% instances), Abbr (3; 0% instances), ExtPos (3; 0% instances)
NUM occurs with 25 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Degree=Cmp, ExtPos=NUM, ExtPos=PRON, Gender=Fem, Gender=Masc, Gender=Neut, NumForm=Combi, NumForm=Cyril, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, NumType=Sets, Number=Sing, Typo=Yes
NUM occurs with 107 feature combinations.
The most frequent feature combination is NumForm=Digit|NumType=Card (3406 tokens).
Examples: 2, 1, 3, 5, 4, 10, 6, 20, 7, 30
Relations
NUM nodes are attached to their parents using 31 different relations: nummod:gov (5335; 42% instances), nummod (3573; 28% instances), root (752; 6% instances), nmod (570; 4% instances), conj (553; 4% instances), parataxis (443; 3% instances), appos (399; 3% instances), nsubj (255; 2% instances), obj (242; 2% instances), compound (196; 2% instances), obl (117; 1% instances), obl:tmod (72; 1% instances), list (64; 0% instances), xcomp (62; 0% instances), advcl (41; 0% instances), ccomp (27; 0% instances), obl:pronmod (27; 0% instances), obl:float (17; 0% instances), orphan (17; 0% instances), flat (16; 0% instances), acl:relcl (13; 0% instances), nsubj:pass (11; 0% instances), acl (10; 0% instances), amod (10; 0% instances), iobj (8; 0% instances), csubj (6; 0% instances), fixed (4; 0% instances), dep (3; 0% instances), obl:depict (2; 0% instances), parataxis:discourse (2; 0% instances), flat:foreign (1; 0% instances)
Parents of NUM nodes belong to 16 different parts of speech: NOUN (9335; 73% instances), VERB (941; 7% instances), NUM (922; 7% instances), (752; 6% instances), ADJ (325; 3% instances), PRON (171; 1% instances), X (109; 1% instances), PROPN (97; 1% instances), SYM (83; 1% instances), DET (55; 0% instances), ADV (31; 0% instances), INTJ (14; 0% instances), AUX (4; 0% instances), PART (4; 0% instances), CCONJ (3; 0% instances), ADP (2; 0% instances)
8196 (64%) NUM nodes are leaves.
3149 (25%) NUM nodes have one child.
669 (5%) NUM nodes have two children.
834 (6%) NUM nodes have three or more children.
The highest child degree of a NUM node is 29.
Children of NUM nodes are attached using 38 different relations: punct (2640; 34% instances), advmod (1477; 19% instances), conj (556; 7% instances), nmod (543; 7% instances), nsubj (538; 7% instances), case (402; 5% instances), cc (282; 4% instances), obl (249; 3% instances), parataxis (214; 3% instances), compound (190; 2% instances), det (151; 2% instances), cop (101; 1% instances), mark (75; 1% instances), amod (69; 1% instances), advcl (64; 1% instances), iobj (49; 1% instances), list (40; 1% instances), parataxis:discourse (38; 0% instances), orphan (32; 0% instances), appos (25; 0% instances), obl:tmod (24; 0% instances), flat (21; 0% instances), aux (13; 0% instances), nummod:gov (13; 0% instances), acl (11; 0% instances), discourse (11; 0% instances), acl:relcl (8; 0% instances), obl:pronmod (8; 0% instances), expl (7; 0% instances), vocative (6; 0% instances), nummod (5; 0% instances), flat:foreign (4; 0% instances), fixed (3; 0% instances), ccomp (1; 0% instances), dep (1; 0% instances), flat:name (1; 0% instances), goeswith (1; 0% instances), obj (1; 0% instances)
Children of NUM nodes belong to 17 different parts of speech: PUNCT (2640; 34% instances), NUM (922; 12% instances), ADV (912; 12% instances), NOUN (905; 11% instances), PART (644; 8% instances), ADP (368; 5% instances), CCONJ (280; 4% instances), ADJ (239; 3% instances), VERB (236; 3% instances), DET (192; 2% instances), PRON (186; 2% instances), AUX (114; 1% instances), SYM (87; 1% instances), SCONJ (78; 1% instances), PROPN (36; 0% instances), X (28; 0% instances), INTJ (7; 0% instances)