home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-GSD: POS Tags: PROPN

There are 4481 PROPN lemmas (23%), 5011 PROPN types (16%) and 6617 PROPN tokens (7%). Out of 16 observed tags, the rank of PROPN is: 2 in number of lemmas, 3 in number of types and 6 in number of tokens.

The 10 most frequent PROPN lemmas: Россия, США, СССР, Украина, Франция, Москва, Германия, Александр, Испания, Владимир

The 10 most frequent PROPN types: России, США, СССР, Украины, Франции, Германии, Европы, Испании, РФ, Александра

The 10 most frequent ambiguous lemmas: Мария (PROPN 7, X 1), Вест (PROPN 4, X 1), ISO (X 4, PROPN 2), POST (PROPN 2, X 1), орда (NOUN 1, PROPN 1), форт (NOUN 3, PROPN 2), BBC (PROPN 1, X 1), FM (NOUN 1, PROPN 1), HD (NOUN 3, PROPN 1), MTV (X 3, PROPN 1)

The 10 most frequent ambiguous types: ВС (PROPN 4, NOUN 1), Вест (PROPN 4, X 1), Мария (PROPN 4, X 1), ЦК (NOUN 4, PROPN 4), И (CCONJ 15, PROPN 3), Сити (NOUN 4, PROPN 3, X 1), ISO (X 4, PROPN 2), POST (PROPN 2, X 1), Динамо (NOUN 2, PROPN 2), Ла (PART 4, PROPN 2, NOUN 1, X 1)

Morphology

The form / lemma ratio of PROPN is 1.118277 (the average of all parts of speech is 1.598617).

The 1st highest number of forms (6) was observed with the lemma “Москва”: М., Москва, Москве, Москвой, Москву, Москвы.

The 2nd highest number of forms (5) was observed with the lemma “Вильгельм”: Вильге́льм, Вильгельм, Вильгельма, Вильгельмом, Вильгельму.

The 3rd highest number of forms (5) was observed with the lemma “Владимир”: Влади́мир, Владимир, Владимира, Владимиром, Владимиру.

PROPN occurs with 6 features: Animacy (6585; 100% instances), Case (6585; 100% instances), Gender (6584; 100% instances), Number (6584; 100% instances), Abbr (13; 0% instances), Foreign (2; 0% instances)

PROPN occurs with 15 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, Number=Plur, Number=Sing

PROPN occurs with 54 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing (1563 tokens). Examples: Владимир, Александр, Джон, Карл, Михаил, Сергей, Юрий, Алексей, Виктор, Иван

Relations

PROPN nodes are attached to their parents using 23 different relations: nmod (1706; 26% instances), appos (1248; 19% instances), flat:name (986; 15% instances), nsubj (904; 14% instances), conj (595; 9% instances), obl (569; 9% instances), obj (112; 2% instances), flat:foreign (87; 1% instances), list (87; 1% instances), nsubj:pass (63; 1% instances), iobj (60; 1% instances), root (56; 1% instances), obl:agent (47; 1% instances), parataxis (38; 1% instances), orphan (13; 0% instances), amod (10; 0% instances), compound (10; 0% instances), dep (10; 0% instances), xcomp (10; 0% instances), vocative (3; 0% instances), acl:relcl (1; 0% instances), ccomp (1; 0% instances), flat (1; 0% instances)

Parents of PROPN nodes belong to 12 different parts of speech: NOUN (3007; 45% instances), PROPN (1812; 27% instances), VERB (1563; 24% instances), ADJ (99; 1% instances), (56; 1% instances), X (34; 1% instances), PART (19; 0% instances), ADV (12; 0% instances), NUM (10; 0% instances), PRON (3; 0% instances), DET (1; 0% instances), SYM (1; 0% instances)

3013 (46%) PROPN nodes are leaves.

2123 (32%) PROPN nodes have one child.

828 (13%) PROPN nodes have two children.

653 (10%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 19.

Children of PROPN nodes are attached using 28 different relations: punct (1915; 31% instances), case (1053; 17% instances), flat:name (1009; 16% instances), conj (636; 10% instances), amod (357; 6% instances), appos (270; 4% instances), cc (270; 4% instances), nmod (225; 4% instances), flat:foreign (87; 1% instances), list (84; 1% instances), parataxis (67; 1% instances), acl:relcl (65; 1% instances), nsubj (53; 1% instances), nummod:entity (49; 1% instances), acl (44; 1% instances), advmod (32; 1% instances), orphan (14; 0% instances), det (13; 0% instances), nummod (7; 0% instances), advcl (2; 0% instances), expl (2; 0% instances), nummod:gov (2; 0% instances), ccomp (1; 0% instances), compound (1; 0% instances), cop (1; 0% instances), flat (1; 0% instances), iobj (1; 0% instances), obl (1; 0% instances)

Children of PROPN nodes belong to 16 different parts of speech: PUNCT (1915; 31% instances), PROPN (1812; 29% instances), ADP (1039; 17% instances), ADJ (433; 7% instances), NOUN (415; 7% instances), CCONJ (268; 4% instances), VERB (133; 2% instances), NUM (78; 1% instances), PART (58; 1% instances), X (48; 1% instances), DET (26; 0% instances), ADV (23; 0% instances), PRON (7; 0% instances), SYM (5; 0% instances), AUX (1; 0% instances), SCONJ (1; 0% instances)