home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-GSD: POS Tags: NOUN

There are 6399 NOUN lemmas (33%), 11537 NOUN types (38%) and 27249 NOUN tokens (27%). Out of 16 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: ГОД, ВРЕМЯ, ГОРОД, ЧЕЛОВЕК, ЧАСТЬ, РАЙОН, ОБЛАСТЬ, СОСТАВ, НАСЕЛЕНИЕ, РЕКА

The 10 most frequent NOUN types: года, году, время, области, лет, человек, войны, реки, год, км

The 10 most frequent ambiguous lemmas: Г. (NOUN 58, PROPN 1), ЧЛЕН (NOUN 57, ADV 1), ЗЕМЛЯ (NOUN 54, PROPN 1), ОСТРОВ (NOUN 52, PROPN 1), ПЕСНЯ (NOUN 46, AUX 1), ДОМ (NOUN 44, PROPN 1), АВГУСТ (NOUN 41, PROPN 6), ВОСТОК (NOUN 41, PROPN 1), СЛОВО (NOUN 36, PROPN 1), УЛИЦА (NOUN 31, ADV 1)

The 10 most frequent ambiguous types: мм (NOUN 27, ADJ 5), имени (NOUN 25, ADP 1), м (NOUN 23, ADJ 12), песни (NOUN 21, AUX 1), No (NOUN 21, PART 1), дома (NOUN 13, ADV 3), основном (NOUN 16, ADJ 1), начала (NOUN 12, VERB 5), б (NOUN 6, ADJ 1), типа (ADP 14, NOUN 11)

Morphology

The form / lemma ratio of NOUN is 1.802938 (the average of all parts of speech is 1.592402).

The 1st highest number of forms (10) was observed with the lemma “АКТЕР”: актер, актера, актеров, актеры, актёр, актёра, актёрами, актёров, актёром, актёры.

The 2nd highest number of forms (10) was observed with the lemma “ГОД”: год, года, годам, годами, годах, годов, годом, году, годы, лет.

The 3rd highest number of forms (10) was observed with the lemma “ФИЛЬМ”: фильм, фильма, фильмам, фильмами, фильмах, фильме, фильмов, фильмом, фильму, фильмы.

NOUN occurs with 4 features: Animacy (27194; 100% instances), Case (27194; 100% instances), Gender (27194; 100% instances), Number (27194; 100% instances)

NOUN occurs with 15 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Par, Case=Voc, Gender=Fem, Gender=Masc, Gender=Neut, Number=Plur, Number=Sing

NOUN occurs with 73 feature combinations. The most frequent feature combination is Animacy=Inan|Case=Gen|Gender=Masc|Number=Sing (3017 tokens). Examples: года, города, мира, века, декабря, района, сентября, января, марта, июня

Relations

NOUN nodes are attached to their parents using 30 different relations: nmod (9302; 34% instances), obl (5995; 22% instances), nsubj (3147; 12% instances), obj (2460; 9% instances), conj (2291; 8% instances), appos (1101; 4% instances), root (756; 3% instances), nsubj:pass (547; 2% instances), iobj (480; 2% instances), xcomp (259; 1% instances), goeswith (201; 1% instances), obl:agent (179; 1% instances), parataxis (144; 1% instances), orphan (91; 0% instances), fixed (65; 0% instances), list (63; 0% instances), nummod:gov (56; 0% instances), ccomp (27; 0% instances), acl:relcl (18; 0% instances), acl (16; 0% instances), amod (11; 0% instances), compound (11; 0% instances), discourse (8; 0% instances), advcl (7; 0% instances), flat (5; 0% instances), nummod (4; 0% instances), vocative (2; 0% instances), dep (1; 0% instances), flat:foreign (1; 0% instances), mark (1; 0% instances)

Parents of NOUN nodes belong to 13 different parts of speech: NOUN (12250; 45% instances), VERB (12088; 44% instances), (756; 3% instances), ADJ (682; 3% instances), PROPN (451; 2% instances), ADV (395; 1% instances), AUX (245; 1% instances), NUM (207; 1% instances), ADP (61; 0% instances), SYM (52; 0% instances), PRON (30; 0% instances), DET (20; 0% instances), PUNCT (12; 0% instances)

3883 (14%) NOUN nodes are leaves.

8951 (33%) NOUN nodes have one child.

8637 (32%) NOUN nodes have two children.

5778 (21%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 31.

Children of NOUN nodes are attached using 36 different relations: nmod (10685; 22% instances), amod (10430; 22% instances), case (8698; 18% instances), punct (6078; 13% instances), appos (2430; 5% instances), conj (2217; 5% instances), cc (1317; 3% instances), det (1284; 3% instances), acl (1017; 2% instances), nummod:gov (852; 2% instances), nsubj (643; 1% instances), nummod (557; 1% instances), acl:relcl (506; 1% instances), advmod (340; 1% instances), parataxis (192; 0% instances), cop (145; 0% instances), list (84; 0% instances), goeswith (81; 0% instances), orphan (81; 0% instances), compound (69; 0% instances), discourse (55; 0% instances), iobj (41; 0% instances), mark (41; 0% instances), advcl (34; 0% instances), obl (27; 0% instances), dep (19; 0% instances), fixed (18; 0% instances), obl:agent (15; 0% instances), ccomp (13; 0% instances), flat (7; 0% instances), obj (3; 0% instances), xcomp (3; 0% instances), aux (2; 0% instances), aux:pass (2; 0% instances), flat:foreign (1; 0% instances), vocative (1; 0% instances)

Children of NOUN nodes belong to 16 different parts of speech: NOUN (12250; 26% instances), ADJ (10298; 21% instances), ADP (8641; 18% instances), PUNCT (6147; 13% instances), PROPN (3198; 7% instances), VERB (1680; 4% instances), NUM (1530; 3% instances), DET (1434; 3% instances), CCONJ (1297; 3% instances), ADV (868; 2% instances), PRON (211; 0% instances), PART (168; 0% instances), AUX (162; 0% instances), SYM (62; 0% instances), SCONJ (40; 0% instances), X (2; 0% instances)