home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-Taiga: POS Tags: PROPN

There are 2052 PROPN lemmas (10%), 2536 PROPN types (7%) and 4461 PROPN tokens (2%). Out of 17 observed tags, the rank of PROPN is: 4 in number of lemmas, 4 in number of types and 11 in number of tokens.

The 10 most frequent PROPN lemmas: @xxxxxx, Петрович, Россия, Москва, Жириновский, Крым, Украина, Ирина, Яблоко, США

The 10 most frequent PROPN types: @xxxxxx, Петрович, россии, жириновский, сша, ЛДПР, парнас, Россия, сочи, яблоко

The 10 most frequent ambiguous lemmas: КПРФ (PROPN 15, NOUN 1), Н. (PROPN 11, ADJ 1), С. (PROPN 6, X 1), П. (PROPN 4, X 1), А (CCONJ 2, NOUN 2, PROPN 2), ВОВ (PROPN 2, NOUN 1), МИД (PROPN 2, NOUN 1), ПЦР (PROPN 2, NOUN 1), ЧМЗ (PROPN 2, NOUN 1), 5-ка (NOUN 1, PROPN 1)

The 10 most frequent ambiguous types: яблоко (NOUN 5, PROPN 1), Востока (PROPN 13, NOUN 1), Н. (PROPN 11, DET 1), Наука (PROPN 10, NOUN 1), яблока (NOUN 1, PROPN 1), С. (PROPN 6, X 1), ржд (NOUN 1, PROPN 1), тик (PROPN 5, NOUN 1), Звезды (PROPN 4, NOUN 3), П. (PROPN 4, X 1)

Morphology

The form / lemma ratio of PROPN is 1.235867 (the average of all parts of speech is 1.875784).

The 1st highest number of forms (6) was observed with the lemma “Америка”: Америка, Америке, америки, америку, омерикой, омерику.

The 2nd highest number of forms (6) was observed with the lemma “Москва”: М., Москва, Москве, Москвой, москву, москвы.

The 3rd highest number of forms (6) was observed with the lemma “Россия”: Рoccии, Россией, Россиею, Россия, россии, россию.

PROPN occurs with 8 features: NameType (4445; 100% instances), Animacy (3794; 85% instances), Case (3794; 85% instances), Number (3794; 85% instances), Gender (3793; 85% instances), Abbr (411; 9% instances), Typo (38; 1% instances), Foreign (17; 0% instances)

PROPN occurs with 26 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, NameType=Com, NameType=Geo, NameType=Giv, NameType=Oth, NameType=Patrn, NameType=Pro, NameType=Prs, NameType=Sur, NameType=Zoo, Number=Plur, Number=Sing, Typo=Yes

PROPN occurs with 195 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|NameType=Sur|Number=Sing (370 tokens). Examples: жириновский, Леонов, Явлинский, путин, Петров, Тихонов, Александров, Касьянов, Зюганов, Ким

Relations

PROPN nodes are attached to their parents using 27 different relations: nmod (932; 21% instances), nsubj (649; 15% instances), obl (604; 14% instances), flat:name (522; 12% instances), appos (448; 10% instances), conj (398; 9% instances), root (209; 5% instances), vocative (196; 4% instances), parataxis (131; 3% instances), obj (122; 3% instances), iobj (59; 1% instances), list (45; 1% instances), flat:foreign (36; 1% instances), compound (21; 0% instances), advcl (15; 0% instances), orphan (14; 0% instances), xcomp (14; 0% instances), obl:agent (13; 0% instances), nsubj:pass (11; 0% instances), ccomp (5; 0% instances), acl:relcl (4; 0% instances), dislocated (4; 0% instances), acl (3; 0% instances), csubj (2; 0% instances), dep (2; 0% instances), flat (1; 0% instances), goeswith (1; 0% instances)

Parents of PROPN nodes belong to 15 different parts of speech: NOUN (1530; 34% instances), VERB (1397; 31% instances), PROPN (1009; 23% instances), (209; 5% instances), ADJ (123; 3% instances), PRON (48; 1% instances), X (46; 1% instances), ADV (45; 1% instances), PART (19; 0% instances), DET (18; 0% instances), NUM (9; 0% instances), AUX (3; 0% instances), INTJ (3; 0% instances), SCONJ (1; 0% instances), SYM (1; 0% instances)

1906 (43%) PROPN nodes are leaves.

1413 (32%) PROPN nodes have one child.

606 (14%) PROPN nodes have two children.

536 (12%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 19.

Children of PROPN nodes are attached using 31 different relations: punct (1196; 25% instances), case (1080; 23% instances), flat:name (513; 11% instances), conj (494; 10% instances), amod (218; 5% instances), cc (202; 4% instances), parataxis (154; 3% instances), appos (144; 3% instances), advmod (134; 3% instances), list (114; 2% instances), nsubj (105; 2% instances), nmod (98; 2% instances), det (64; 1% instances), mark (33; 1% instances), discourse (32; 1% instances), acl (28; 1% instances), acl:relcl (24; 1% instances), cop (20; 0% instances), orphan (15; 0% instances), compound (13; 0% instances), flat:foreign (11; 0% instances), vocative (8; 0% instances), obl (7; 0% instances), flat (5; 0% instances), goeswith (4; 0% instances), advcl (3; 0% instances), dep (3; 0% instances), nummod (3; 0% instances), expl (2; 0% instances), iobj (2; 0% instances), nummod:gov (1; 0% instances)

Children of PROPN nodes belong to 17 different parts of speech: PUNCT (1196; 25% instances), ADP (1058; 22% instances), PROPN (1009; 21% instances), NOUN (418; 9% instances), ADJ (265; 6% instances), CCONJ (203; 4% instances), VERB (105; 2% instances), ADV (93; 2% instances), PART (85; 2% instances), DET (77; 2% instances), PRON (55; 1% instances), SCONJ (53; 1% instances), NUM (34; 1% instances), X (26; 1% instances), SYM (24; 1% instances), AUX (20; 0% instances), INTJ (9; 0% instances)