home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-SynTagRus: POS Tags: NUM

There are 1259 NUM lemmas (2%), 1357 NUM types (1%) and 19431 NUM tokens (1%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 13 in number of tokens.

The 10 most frequent NUM lemmas: один, два, много, несколько, три, 1, 10, 20, четыре, 2

The 10 most frequent NUM types: один, несколько, два, три, одной, многие, 1, 10, двух, две

The 10 most frequent ambiguous lemmas: один (NUM 2706, DET 984, NOUN 3), много (NUM 1077, ADV 729), несколько (NUM 1038, ADV 111), 1 (NUM 417, ADJ 26), 10 (NUM 407, ADJ 20), 20 (NUM 324, ADJ 14), 2 (NUM 309, ADJ 14), 15 (NUM 280, ADJ 17), 5 (NUM 257, ADJ 12), 3 (NUM 242, ADJ 11)

The 10 most frequent ambiguous types: один (NUM 697, DET 179), несколько (NUM 744, ADV 103), одной (NUM 409, DET 139), многие (NUM 301, DET 46, PRON 34, ADJ 3), 1 (NUM 417, ADJ 26), 10 (NUM 407, ADJ 21), 20 (NUM 323, ADJ 14), 2 (NUM 309, ADJ 14), одного (NUM 278, DET 91), одна (NUM 242, DET 94)

Morphology

The form / lemma ratio of NUM is 1.077840 (the average of all parts of speech is 2.654430).

The 1st highest number of forms (12) was observed with the lemma “один”: один, одна, одни, одним, одними, одно, одного, одной, одном, одному, одною, одну.

The 2nd highest number of forms (9) was observed with the lemma “оба”: оба, обе, обеим, обеими, обеих, обоего, обоим, обоими, обоих.

The 3rd highest number of forms (7) was observed with the lemma “много”: больше, мн, многие, многим, многими, многих, много.

NUM occurs with 6 features: NumType (15372; 79% instances), Case (8672; 45% instances), Gender (4097; 21% instances), Animacy (2023; 10% instances), Degree (92; 0% instances), Number (6; 0% instances)

NUM occurs with 14 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Degree=Cmp, Gender=Fem, Gender=Masc, Gender=Neut, NumType=Card, Number=Sing

NUM occurs with 65 feature combinations. The most frequent feature combination is NumType=Card (9133 tokens). Examples: многие, 1, 10, один, несколько, два, многих, 20, 2, 15

Relations

NUM nodes are attached to their parents using 28 different relations: nummod (9087; 47% instances), nummod:gov (3410; 18% instances), nmod (1743; 9% instances), obl (1265; 7% instances), nsubj (688; 4% instances), amod (560; 3% instances), conj (484; 2% instances), appos (453; 2% instances), root (369; 2% instances), parataxis (310; 2% instances), nummod:entity (309; 2% instances), compound (217; 1% instances), flat:name (102; 1% instances), obj (97; 0% instances), nsubj:pass (75; 0% instances), xcomp (75; 0% instances), advcl (37; 0% instances), orphan (35; 0% instances), iobj (34; 0% instances), ccomp (31; 0% instances), acl:relcl (12; 0% instances), fixed (12; 0% instances), acl (11; 0% instances), flat (5; 0% instances), list (4; 0% instances), advmod (3; 0% instances), csubj (2; 0% instances), obl:tmod (1; 0% instances)

Parents of NUM nodes belong to 13 different parts of speech: NOUN (13179; 68% instances), VERB (2196; 11% instances), NUM (1505; 8% instances), SYM (1088; 6% instances), PROPN (582; 3% instances), (369; 2% instances), ADJ (314; 2% instances), PRON (97; 0% instances), ADV (71; 0% instances), DET (15; 0% instances), X (11; 0% instances), PART (3; 0% instances), ADP (1; 0% instances)

11625 (60%) NUM nodes are leaves.

5314 (27%) NUM nodes have one child.

1516 (8%) NUM nodes have two children.

976 (5%) NUM nodes have three or more children.

The highest child degree of a NUM node is 13.

Children of NUM nodes are attached using 30 different relations: punct (3112; 26% instances), nmod (2878; 24% instances), advmod (1586; 13% instances), case (1047; 9% instances), flat (740; 6% instances), conj (422; 4% instances), nsubj (380; 3% instances), cc (299; 2% instances), obl (250; 2% instances), parataxis (243; 2% instances), amod (232; 2% instances), compound (163; 1% instances), det (157; 1% instances), cop (113; 1% instances), mark (101; 1% instances), appos (72; 1% instances), fixed (49; 0% instances), orphan (42; 0% instances), acl (40; 0% instances), advcl (29; 0% instances), acl:relcl (23; 0% instances), flat:foreign (15; 0% instances), iobj (14; 0% instances), csubj (6; 0% instances), list (6; 0% instances), expl (3; 0% instances), nummod (3; 0% instances), discourse (2; 0% instances), flat:name (2; 0% instances), obj (1; 0% instances)

Children of NUM nodes belong to 17 different parts of speech: PUNCT (3112; 26% instances), NOUN (2835; 24% instances), NUM (1505; 13% instances), ADP (1059; 9% instances), ADV (855; 7% instances), PART (812; 7% instances), ADJ (427; 4% instances), PRON (310; 3% instances), CCONJ (295; 2% instances), VERB (234; 2% instances), DET (170; 1% instances), PROPN (125; 1% instances), SCONJ (124; 1% instances), AUX (113; 1% instances), SYM (48; 0% instances), X (5; 0% instances), INTJ (1; 0% instances)