home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-Taiga: POS Tags: NOUN

There are 19424 NOUN lemmas (34%), 51715 NOUN types (33%) and 378122 NOUN tokens (21%). Out of 17 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 2 in number of tokens.

The 10 most frequent NOUN lemmas: слово, язык, человек, год, время, век, искусство, рука, жизнь, место

The 10 most frequent NOUN types: слова, время, слово, языка, раз, в., жизни, слов, человек, языке

The 10 most frequent ambiguous lemmas: год (NOUN 3065, X 1), раз (NOUN 1435, SCONJ 62, VERB 7, X 3), дом (NOUN 1270, X 1), речь (NOUN 1099, VERB 2), город (NOUN 784, X 1), земля (NOUN 664, X 1), друг (PRON 810, NOUN 644), пора (NOUN 597, VERB 74), правда (NOUN 551, ADV 30), свет (NOUN 526, PROPN 1)

The 10 most frequent ambiguous types: раз (NOUN 1175, SCONJ 43, VERB 5, X 2), времени (NOUN 733, X 1), место (NOUN 681, X 1), лет (NOUN 696, X 2), правда (NOUN 215, ADV 30), дома (NOUN 396, ADV 185), глава (NOUN 44, X 2), песни (NOUN 200, X 1), год (NOUN 205, X 1), начала (NOUN 212, VERB 99)

Morphology

The form / lemma ratio of NOUN is 2.662428 (the average of all parts of speech is 2.706171).

The 1st highest number of forms (22) was observed with the lemma “вода”: вада, вада́, вадой, во́дам, во́дами, во́дах, во́ду, во́ды, вод, вода, вода́, водам, водами, водах, воде, водой, водою, воду, воды, воды́, выда, вэда.

The 2nd highest number of forms (21) was observed with the lemma “сестра”: се, сестер, сестра, сестра́, сестра́м, сестрами, сестре, сестре́, сестрой, сестрою, сестру, сестры, сестры́, сестёр, сетры, систра́, сястре́, сястры́, сёстрами, сёстры, сёстрый.

The 3rd highest number of forms (19) was observed with the lemma “девушка”: де́вушек, дев-ушк-а, девушек, девушек-□, девушк-а, девушк-ам, девушк-е, девушк-и, девушк-у, девушка, девушкам, девушками, девушках, девушке, девушки, девушкой, девушкою, девушку, евушка.

NOUN occurs with 9 features: Animacy (373634; 99% instances), Case (373634; 99% instances), Gender (373634; 99% instances), Number (373634; 99% instances), Abbr (4087; 1% instances), InflClass (2183; 1% instances), Typo (620; 0% instances), ExtPos (24; 0% instances), Foreign (16; 0% instances)

NOUN occurs with 24 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Par, Case=Voc, ExtPos=ADV, ExtPos=INTJ, ExtPos=NOUN, ExtPos=VERB, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, InflClass=Ind, Number=Dual, Number=Plur, Number=Sing, Typo=Yes

NOUN occurs with 206 feature combinations. The most frequent feature combination is Animacy=Inan|Case=Nom|Gender=Fem|Number=Sing (23665 tokens). Examples: правда, жизнь, глава, форма, литература, история, речь, часть, мысль, вода

Relations

NOUN nodes are attached to their parents using 39 different relations: nmod (79217; 21% instances), obl (71262; 19% instances), nsubj (62644; 17% instances), obj (45757; 12% instances), conj (42430; 11% instances), root (17768; 5% instances), iobj (11996; 3% instances), obl:tmod (10264; 3% instances), appos (9508; 3% instances), parataxis (6284; 2% instances), nsubj:pass (5382; 1% instances), xcomp (3708; 1% instances), fixed (2537; 1% instances), obl:agent (2264; 1% instances), vocative (1736; 0% instances), parataxis:discourse (1287; 0% instances), orphan (891; 0% instances), list (659; 0% instances), ccomp (608; 0% instances), advcl (510; 0% instances), acl (286; 0% instances), flat (210; 0% instances), acl:relcl (192; 0% instances), dislocated (183; 0% instances), nummod (183; 0% instances), csubj (109; 0% instances), compound (88; 0% instances), nummod:gov (72; 0% instances), flat:name (26; 0% instances), case (15; 0% instances), nsubj:outer (13; 0% instances), advmod (7; 0% instances), obl:depict (7; 0% instances), dep (6; 0% instances), obl:pronmod (4; 0% instances), flat:foreign (3; 0% instances), reparandum (3; 0% instances), discourse (2; 0% instances), csubj:outer (1; 0% instances)

Parents of NOUN nodes belong to 17 different parts of speech: VERB (196310; 52% instances), NOUN (132405; 35% instances), (17768; 5% instances), ADJ (14797; 4% instances), ADV (3391; 1% instances), PROPN (3243; 1% instances), PRON (3116; 1% instances), DET (2437; 1% instances), ADP (2198; 1% instances), NUM (905; 0% instances), X (579; 0% instances), PART (505; 0% instances), SCONJ (199; 0% instances), AUX (150; 0% instances), INTJ (72; 0% instances), SYM (45; 0% instances), CCONJ (2; 0% instances)

69548 (18%) NOUN nodes are leaves.

136805 (36%) NOUN nodes have one child.

97245 (26%) NOUN nodes have two children.

74524 (20%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 23.

Children of NOUN nodes are attached using 44 different relations: amod (113026; 18% instances), case (112082; 18% instances), punct (98069; 16% instances), nmod (89865; 15% instances), det (45174; 7% instances), conj (42467; 7% instances), cc (23183; 4% instances), appos (16731; 3% instances), acl (13439; 2% instances), advmod (12854; 2% instances), nsubj (9651; 2% instances), parataxis (9299; 2% instances), acl:relcl (6218; 1% instances), nummod:gov (5152; 1% instances), nummod (3596; 1% instances), cop (2626; 0% instances), parataxis:discourse (1900; 0% instances), mark (1875; 0% instances), orphan (907; 0% instances), obl (796; 0% instances), expl (714; 0% instances), discourse (629; 0% instances), iobj (520; 0% instances), list (508; 0% instances), vocative (361; 0% instances), advcl (352; 0% instances), csubj (288; 0% instances), compound (154; 0% instances), obl:tmod (141; 0% instances), aux (64; 0% instances), dislocated (56; 0% instances), obl:float (41; 0% instances), ccomp (38; 0% instances), fixed (33; 0% instances), flat:name (33; 0% instances), flat (21; 0% instances), dep (15; 0% instances), obj (9; 0% instances), goeswith (8; 0% instances), xcomp (6; 0% instances), flat:foreign (4; 0% instances), reparandum (3; 0% instances), csubj:outer (2; 0% instances), nsubj:outer (1; 0% instances)

Children of NOUN nodes belong to 17 different parts of speech: NOUN (132405; 22% instances), ADJ (115620; 19% instances), ADP (108831; 18% instances), PUNCT (98069; 16% instances), DET (46762; 8% instances), VERB (28006; 5% instances), CCONJ (22484; 4% instances), PROPN (17837; 3% instances), NUM (9335; 2% instances), PART (9333; 2% instances), ADV (7300; 1% instances), PRON (7279; 1% instances), SCONJ (4874; 1% instances), AUX (2714; 0% instances), X (1399; 0% instances), INTJ (336; 0% instances), SYM (327; 0% instances)