home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech: POS Tags: PROPN

There are 15241 PROPN lemmas (25%), 21954 PROPN types (17%) and 84031 PROPN tokens (6%). Out of 17 observed tags, the rank of PROPN is: 2 in number of lemmas, 3 in number of types and 6 in number of tokens.

The 10 most frequent PROPN lemmas: Praha, ČR, Evropa, LN, Jan, Jiří, Německo, Brno, ODS, USA

The 10 most frequent PROPN types: Praha, ČR, Praze, LN, ODS, USA, J, Jiří, Jan, OSN

The 10 most frequent ambiguous lemmas: J (PROPN 422, ADJ 30), M (PROPN 244, NOUN 8, ADJ 1), V (PROPN 210, NUM 23, NOUN 7, ADJ 5), A (PROPN 172, ADJ 8, NOUN 8), York (PROPN 165, ADJ 5), P (PROPN 136, ADJ 4, NOUN 2), čt (NOUN 4, PROPN 2), S (PROPN 116, ADJ 12, NOUN 2), Washington (PROPN 111, ADJ 1), r (NOUN 55, ADV 1, PROPN 1)

The 10 most frequent ambiguous types: J (PROPN 422, ADJ 30, NOUN 3), M (PROPN 244, NOUN 51, X 3, ADJ 1), V (ADP 3736, PROPN 210, NUM 23, NOUN 15, ADJ 6, ADV 2), A (CCONJ 1042, PROPN 172, NOUN 93, ADJ 19, X 4), Rusko (PROPN 163, ADJ 3), Německo (PROPN 144, ADJ 2), P (PROPN 136, NOUN 124, ADJ 17, ADP 1), S (ADP 470, PROPN 117, NOUN 38, ADJ 14, X 3), r (NOUN 433, ADV 1, PROPN 1), F (PROPN 99, NOUN 27, ADJ 10)

Morphology

The form / lemma ratio of PROPN is 1.440457 (the average of all parts of speech is 2.181829).

The 1st highest number of forms (11) was observed with the lemma “Čech”: ČECH, ČEŠI, Čech, Čecha, Čechem, Čechovi, Čechy, Čechů, Čechům, Češi, Češích.

The 2nd highest number of forms (10) was observed with the lemma “Jan”: JAN, JANA, Jan, Jana, Janem, Janovi, Janové, Janu, Jany, Janů.

The 3rd highest number of forms (10) was observed with the lemma “Němec”: NĚMCI, NĚMCŮ, NĚMEC, Němce, Němcem, Němci, Němcích, Němců, Němcům, Němec.

PROPN occurs with 9 features: NameType (84031; 100% instances), Polarity (84031; 100% instances), Gender (82083; 98% instances), Number (68761; 82% instances), Case (66478; 79% instances), Animacy (48949; 58% instances), Abbr (13042; 16% instances), Foreign (3684; 4% instances), Style (155; 0% instances)

PROPN occurs with 46 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, NameType=Com, NameType=Com,Geo, NameType=Com,Giv, NameType=Com,Giv,Sur, NameType=Com,Nat, NameType=Com,Pro, NameType=Com,Sur, NameType=Geo, NameType=Geo,Giv, NameType=Geo,Giv,Sur, NameType=Geo,Oth, NameType=Geo,Pro, NameType=Geo,Sur, NameType=Giv, NameType=Giv,Nat, NameType=Giv,Oth, NameType=Giv,Pro, NameType=Giv,Pro,Sur, NameType=Giv,Sur, NameType=Nat, NameType=Nat,Sur, NameType=Oth, NameType=Pro, NameType=Pro,Sur, NameType=Sur, Number=Plur, Number=Sing, Polarity=Pos, Style=Arch, Style=Coll, Style=Expr, Style=Rare

PROPN occurs with 613 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|NameType=Sur|Number=Sing|Polarity=Pos (14098 tokens). Examples: Klaus, Havel, Svoboda, Mečiar, Jelcin, John, Zeman, Němec, Novák, Benda

Relations

PROPN nodes are attached to their parents using 27 different relations: nmod (26735; 32% instances), nsubj (14713; 18% instances), flat (13595; 16% instances), conj (7679; 9% instances), obl (6762; 8% instances), root (5373; 6% instances), dep (2878; 3% instances), obj (2057; 2% instances), appos (1361; 2% instances), obl:arg (751; 1% instances), iobj (558; 1% instances), orphan (495; 1% instances), flat:foreign (431; 1% instances), nsubj:pass (351; 0% instances), advcl (155; 0% instances), obl:agent (34; 0% instances), xcomp (32; 0% instances), vocative (21; 0% instances), cc (18; 0% instances), ccomp (8; 0% instances), amod (6; 0% instances), case (6; 0% instances), acl (5; 0% instances), parataxis (3; 0% instances), csubj (2; 0% instances), csubj:pass (1; 0% instances), punct (1; 0% instances)

Parents of PROPN nodes belong to 15 different parts of speech: NOUN (26293; 31% instances), PROPN (26160; 31% instances), VERB (22399; 27% instances), (5373; 6% instances), ADJ (2773; 3% instances), NUM (403; 0% instances), ADV (375; 0% instances), DET (109; 0% instances), PRON (100; 0% instances), PART (18; 0% instances), ADP (12; 0% instances), SYM (7; 0% instances), CCONJ (5; 0% instances), INTJ (2; 0% instances), PUNCT (2; 0% instances)

33788 (40%) PROPN nodes are leaves.

27275 (32%) PROPN nodes have one child.

12895 (15%) PROPN nodes have two children.

10073 (12%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 29.

Children of PROPN nodes are attached using 31 different relations: punct (19366; 21% instances), case (16403; 18% instances), flat (13613; 15% instances), nmod (13165; 14% instances), conj (8172; 9% instances), amod (5217; 6% instances), cc (3647; 4% instances), dep (3377; 4% instances), nummod (1629; 2% instances), flat:foreign (1570; 2% instances), acl (1565; 2% instances), appos (1312; 1% instances), advmod:emph (1233; 1% instances), orphan (493; 1% instances), xcomp (392; 0% instances), mark (325; 0% instances), det (113; 0% instances), advmod (84; 0% instances), parataxis (80; 0% instances), cop (67; 0% instances), obl (67; 0% instances), nsubj (62; 0% instances), nummod:gov (40; 0% instances), advcl (9; 0% instances), obj (8; 0% instances), det:numgov (5; 0% instances), aux (3; 0% instances), det:nummod (3; 0% instances), ccomp (2; 0% instances), obl:arg (2; 0% instances), expl:pv (1; 0% instances)

Children of PROPN nodes belong to 16 different parts of speech: PROPN (26160; 28% instances), PUNCT (19368; 21% instances), ADP (16536; 18% instances), NOUN (12888; 14% instances), ADJ (6624; 7% instances), CCONJ (3993; 4% instances), NUM (2594; 3% instances), VERB (1849; 2% instances), ADV (1092; 1% instances), SCONJ (335; 0% instances), DET (257; 0% instances), PART (175; 0% instances), AUX (70; 0% instances), PRON (52; 0% instances), SYM (24; 0% instances), INTJ (8; 0% instances)