home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-SynTagRus: POS Tags: NOUN

There are 16462 NOUN lemmas (36%), 41929 NOUN types (36%) and 271122 NOUN tokens (25%). Out of 17 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: год, человек, время, страна, дело, работа, система, жизнь, власть, вопрос

The 10 most frequent NOUN types: время, года, лет, году, раз, человек, жизни, люди, людей, власти

The 10 most frequent ambiguous lemmas: страна (NOUN 1652, X 1), раз (NOUN 880, SCONJ 37, ADV 4), ученый (NOUN 601, ADJ 25), право (NOUN 473, ADV 4), правда (ADV 195, NOUN 119), больной (NOUN 106, ADJ 46), рабочий (ADJ 200, NOUN 86), теракт (NOUN 74, X 1), пол (NUM 83, NOUN 70), фон (NOUN 65, PART 11)

The 10 most frequent ambiguous types: раз (NOUN 694, SCONJ 28, ADV 4), ученые (NOUN 203, ADJ 3), право (NOUN 161, ADJ 2, ADV 2), начала (NOUN 170, VERB 46), ученых (NOUN 163, ADJ 7), данным (NOUN 154, ADJ 2), дома (NOUN 140, ADV 50), права (NOUN 135, ADJ 3), целом (NOUN 136, ADJ 5), главное (NOUN 96, ADJ 62)

Morphology

The form / lemma ratio of NOUN is 2.547017 (the average of all parts of speech is 2.589298).

The 1st highest number of forms (15) was observed with the lemma “тоннель”: тоннеле, тоннелей, тоннели, тоннель, тоннелю, тоннеля, тоннелям, тоннелями, тоннелях, туннеле, туннелем, туннель, туннелю, туннеля, туннелями.

The 2nd highest number of forms (14) was observed with the lemma “год”: г, г., гг, гг., год, года, годам, годами, годах, годов, годом, году, годы, лет.

The 3rd highest number of forms (13) was observed with the lemma “век”: в, в., вв, век, века, векам, веками, веках, веке, веков, веком, веку, полвека.

NOUN occurs with 5 features: Animacy (271053; 100% instances), Case (271022; 100% instances), Number (271022; 100% instances), Gender (270589; 100% instances), Degree (1; 0% instances)

NOUN occurs with 16 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Par, Case=Voc, Degree=Pos, Gender=Fem, Gender=Masc, Gender=Neut, Number=Plur, Number=Sing

NOUN occurs with 92 feature combinations. The most frequent feature combination is Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing (21337 tokens). Examples: страны, жизни, экономики, власти, стороны, работы, системы, науки, войны, воды

Relations

NOUN nodes are attached to their parents using 31 different relations: nmod (79942; 29% instances), obl (64827; 24% instances), nsubj (42788; 16% instances), obj (26896; 10% instances), conj (22065; 8% instances), root (6956; 3% instances), parataxis (6412; 2% instances), nsubj:pass (6008; 2% instances), iobj (4168; 2% instances), fixed (3796; 1% instances), appos (3177; 1% instances), advcl (1021; 0% instances), flat (748; 0% instances), acl (537; 0% instances), orphan (448; 0% instances), compound (422; 0% instances), ccomp (409; 0% instances), nummod:gov (224; 0% instances), acl:relcl (140; 0% instances), flat:foreign (42; 0% instances), csubj (33; 0% instances), nummod:entity (18; 0% instances), flat:name (13; 0% instances), xcomp (8; 0% instances), amod (6; 0% instances), vocative (6; 0% instances), nummod (5; 0% instances), advmod (4; 0% instances), aux (1; 0% instances), aux:pass (1; 0% instances), expl (1; 0% instances)

Parents of NOUN nodes belong to 17 different parts of speech: VERB (132970; 49% instances), NOUN (105279; 39% instances), ADJ (10584; 4% instances), (6956; 3% instances), ADP (3406; 1% instances), ADV (3315; 1% instances), PROPN (3064; 1% instances), NUM (2525; 1% instances), PRON (1627; 1% instances), DET (540; 0% instances), SYM (413; 0% instances), SCONJ (201; 0% instances), PART (195; 0% instances), X (32; 0% instances), CCONJ (9; 0% instances), INTJ (5; 0% instances), AUX (1; 0% instances)

48860 (18%) NOUN nodes are leaves.

94214 (35%) NOUN nodes have one child.

76617 (28%) NOUN nodes have two children.

51431 (19%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 17.

Children of NOUN nodes are attached using 35 different relations: nmod (90003; 21% instances), amod (85505; 20% instances), case (80307; 18% instances), punct (51034; 12% instances), det (23915; 5% instances), conj (21424; 5% instances), cc (14400; 3% instances), acl (8866; 2% instances), nummod (8726; 2% instances), advmod (7264; 2% instances), appos (7133; 2% instances), parataxis (7129; 2% instances), nsubj (5494; 1% instances), acl:relcl (5456; 1% instances), obl (4379; 1% instances), nummod:gov (3957; 1% instances), mark (3465; 1% instances), cop (2047; 0% instances), flat:foreign (1574; 0% instances), iobj (730; 0% instances), orphan (613; 0% instances), compound (472; 0% instances), advcl (289; 0% instances), fixed (273; 0% instances), csubj (243; 0% instances), discourse (225; 0% instances), ccomp (78; 0% instances), nummod:entity (58; 0% instances), aux (55; 0% instances), flat:name (40; 0% instances), obj (23; 0% instances), flat (12; 0% instances), expl (6; 0% instances), vocative (2; 0% instances), xcomp (2; 0% instances)

Children of NOUN nodes belong to 17 different parts of speech: NOUN (105279; 24% instances), ADJ (83486; 19% instances), ADP (80468; 18% instances), PUNCT (51035; 12% instances), DET (24388; 6% instances), VERB (21382; 5% instances), PROPN (18156; 4% instances), CCONJ (14214; 3% instances), NUM (12854; 3% instances), PRON (6444; 1% instances), ADV (6090; 1% instances), PART (5928; 1% instances), SCONJ (3294; 1% instances), AUX (1524; 0% instances), SYM (336; 0% instances), X (298; 0% instances), INTJ (23; 0% instances)