home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-Taiga: POS Tags: PROPN

There are 8289 PROPN lemmas (15%), 12937 PROPN types (8%) and 67407 PROPN tokens (4%). Out of 17 observed tags, the rank of PROPN is: 4 in number of lemmas, 4 in number of types and 9 in number of tokens.

The 10 most frequent PROPN lemmas: А, В, Толик, И, Н, М, С, Пушкин, Лиза, П

The 10 most frequent PROPN types: А., В., И., Толик, Н., М., С., П., Ф., Г.

The 10 most frequent ambiguous lemmas: А (PROPN 2302, NOUN 3, CCONJ 2, X 2), В (PROPN 1477, NOUN 1), И (PROPN 1080, CCONJ 1), Н (PROPN 1036, ADJ 1), С (PROPN 802, X 5, NOUN 1), П (PROPN 533, NOUN 10, X 2), К (PROPN 398, X 3), Е (PROPN 235, X 4), Б (PROPN 188, NOUN 2, X 1), Т (PROPN 146, NOUN 1, X 1)

The 10 most frequent ambiguous types: В. (PROPN 1477, NOUN 2, ADJ 1), И. (PROPN 1080, ADJ 1), Н. (PROPN 1036, DET 1), С. (PROPN 801, NOUN 21, X 1), П. (PROPN 532, X 1), Мишка (PROPN 313, NOUN 3), Б. (PROPN 188, ADJ 1), Т. (PROPN 146, NOUN 22, PRON 7, ADV 6, ADJ 1), О. (PROPN 142, NOUN 2, PART 1), Возрождения (PROPN 138, NOUN 15)

Morphology

The form / lemma ratio of PROPN is 1.560743 (the average of all parts of speech is 2.706111).

The 1st highest number of forms (11) was observed with the lemma “Иванович”: Ивановича, Ивановичами, Ивановиче, Ивановичем, Ивановичу, Иваныч, Иваныча, Иваныче, Иванычем, Иванычу, иванович.

The 2nd highest number of forms (11) was observed with the lemma “Михайлович”: Миха́йлович, Михайлович, Михайловича, Михайловиче, Михайловичем, Михайловичу, Михайлыч, Михайлыча, Михайлычем, Михайлычу, Михалыч.

The 3rd highest number of forms (10) was observed with the lemma “Алена”: Алена, Алене, Алену, Алены, Алён, Алёна, Алёне, Алёной, Алёны, алёну.

PROPN occurs with 10 features: NameType (67403; 100% instances), Animacy (54576; 81% instances), Case (54576; 81% instances), Number (54576; 81% instances), Gender (54575; 81% instances), Abbr (12570; 19% instances), InflClass (4635; 7% instances), Typo (115; 0% instances), Foreign (17; 0% instances), ExtPos (1; 0% instances)

PROPN occurs with 28 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, ExtPos=PROPN, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, InflClass=Ind, NameType=Com, NameType=Geo, NameType=Giv, NameType=Oth, NameType=Pat, NameType=Pro, NameType=Prs, NameType=Sur, NameType=Zoo, Number=Plur, Number=Sing, Typo=Yes

PROPN occurs with 356 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|NameType=Giv|Number=Sing (7795 tokens). Examples: Толик, Мишка, Кузька, иван, Вовка, Юра, Андрюша, Вик, Павел, Петр

Relations

PROPN nodes are attached to their parents using 30 different relations: flat:name (16816; 25% instances), nsubj (14420; 21% instances), nmod (10423; 15% instances), appos (6379; 9% instances), conj (6327; 9% instances), obl (4421; 7% instances), root (1927; 3% instances), obj (1791; 3% instances), vocative (1491; 2% instances), parataxis (1116; 2% instances), iobj (1109; 2% instances), obl:agent (262; 0% instances), list (234; 0% instances), xcomp (211; 0% instances), nsubj:pass (156; 0% instances), orphan (116; 0% instances), advcl (43; 0% instances), ccomp (37; 0% instances), compound (36; 0% instances), dislocated (24; 0% instances), obl:tmod (23; 0% instances), acl (16; 0% instances), acl:relcl (8; 0% instances), csubj (5; 0% instances), flat (5; 0% instances), discourse (3; 0% instances), parataxis:discourse (3; 0% instances), amod (2; 0% instances), dep (2; 0% instances), nsubj:outer (1; 0% instances)

Parents of PROPN nodes belong to 16 different parts of speech: PROPN (22729; 34% instances), VERB (21982; 33% instances), NOUN (17837; 26% instances), (1927; 3% instances), ADJ (1051; 2% instances), PRON (711; 1% instances), ADV (376; 1% instances), PART (294; 0% instances), DET (229; 0% instances), X (204; 0% instances), NUM (36; 0% instances), INTJ (19; 0% instances), SYM (6; 0% instances), AUX (4; 0% instances), CCONJ (1; 0% instances), SCONJ (1; 0% instances)

36785 (55%) PROPN nodes are leaves.

15330 (23%) PROPN nodes have one child.

8012 (12%) PROPN nodes have two children.

7280 (11%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 27.

Children of PROPN nodes are attached using 39 different relations: flat:name (16865; 28% instances), punct (15694; 26% instances), case (7649; 13% instances), conj (6787; 11% instances), amod (2760; 5% instances), cc (2513; 4% instances), parataxis (1310; 2% instances), nmod (973; 2% instances), advmod (882; 1% instances), appos (829; 1% instances), acl (773; 1% instances), det (697; 1% instances), nsubj (482; 1% instances), acl:relcl (385; 1% instances), list (324; 1% instances), orphan (240; 0% instances), parataxis:discourse (174; 0% instances), cop (117; 0% instances), mark (117; 0% instances), discourse (99; 0% instances), obl (23; 0% instances), nummod:gov (21; 0% instances), compound (19; 0% instances), vocative (19; 0% instances), dislocated (17; 0% instances), expl (15; 0% instances), advcl (14; 0% instances), nummod (11; 0% instances), obl:tmod (8; 0% instances), iobj (7; 0% instances), aux (6; 0% instances), flat (6; 0% instances), goeswith (4; 0% instances), dep (3; 0% instances), obl:float (3; 0% instances), reparandum (2; 0% instances), flat:foreign (1; 0% instances), obj (1; 0% instances), obl:depict (1; 0% instances)

Children of PROPN nodes belong to 17 different parts of speech: PROPN (22729; 38% instances), PUNCT (15694; 26% instances), ADP (7445; 12% instances), NOUN (3243; 5% instances), ADJ (3216; 5% instances), CCONJ (2488; 4% instances), VERB (1639; 3% instances), DET (925; 2% instances), PART (887; 1% instances), ADV (519; 1% instances), PRON (398; 1% instances), SCONJ (313; 1% instances), AUX (124; 0% instances), NUM (97; 0% instances), INTJ (62; 0% instances), X (41; 0% instances), SYM (31; 0% instances)