home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-PDT: POS Tags: PROPN

There are 15241 PROPN lemmas (25%), 21954 PROPN types (17%) and 84031 PROPN tokens (6%). Out of 17 observed tags, the rank of PROPN is: 2 in number of lemmas, 3 in number of types and 6 in number of tokens.

The 10 most frequent PROPN lemmas: Praha, ČR, Evropa, LN, Jan, Jiří, Německo, Brno, ODS, USA

The 10 most frequent PROPN types: Praha, ČR, Praze, LN, ODS, USA, J, Jiří, Jan, OSN

The 10 most frequent ambiguous lemmas: J (PROPN 422, ADJ 30), M (PROPN 244, NOUN 8, ADJ 1), V (PROPN 210, NUM 23, NOUN 7, ADJ 5), A (PROPN 172, ADJ 8, NOUN 8), York (PROPN 165, ADJ 5), P (PROPN 136, ADJ 4, NOUN 2), čt (NOUN 4, PROPN 2), S (PROPN 116, ADJ 12, NOUN 2), Washington (PROPN 111, ADJ 1), r (NOUN 55, ADV 1, PROPN 1)

The 10 most frequent ambiguous types: J (PROPN 422, ADJ 30, NOUN 3), M (PROPN 244, NOUN 51, X 3, ADJ 1), V (ADP 3736, PROPN 210, NUM 23, NOUN 15, ADJ 6, ADV 2), A (CCONJ 1042, PROPN 172, NOUN 93, ADJ 19, X 4), Rusko (PROPN 163, ADJ 3), Německo (PROPN 144, ADJ 2), P (PROPN 136, NOUN 124, ADJ 17, ADP 1), S (ADP 470, PROPN 117, NOUN 38, ADJ 14, X 3), r (NOUN 433, ADV 1, PROPN 1), F (PROPN 99, NOUN 27, ADJ 10)

Morphology

The form / lemma ratio of PROPN is 1.440457 (the average of all parts of speech is 2.181849).

The 1st highest number of forms (11) was observed with the lemma “Čech”: ČECH, ČEŠI, Čech, Čecha, Čechem, Čechovi, Čechy, Čechů, Čechům, Češi, Češích.

The 2nd highest number of forms (10) was observed with the lemma “Jan”: JAN, JANA, Jan, Jana, Janem, Janovi, Janové, Janu, Jany, Janů.

The 3rd highest number of forms (10) was observed with the lemma “Němec”: NĚMCI, NĚMCŮ, NĚMEC, Němce, Němcem, Němci, Němcích, Němců, Němcům, Němec.

PROPN occurs with 9 features: NameType (84031; 100% instances), Polarity (84031; 100% instances), Gender (82083; 98% instances), Number (68761; 82% instances), Case (66478; 79% instances), Animacy (48949; 58% instances), Abbr (13042; 16% instances), Foreign (3684; 4% instances), Style (155; 0% instances)

PROPN occurs with 46 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, NameType=Com, NameType=Com,Geo, NameType=Com,Giv, NameType=Com,Giv,Sur, NameType=Com,Nat, NameType=Com,Pro, NameType=Com,Sur, NameType=Geo, NameType=Geo,Giv, NameType=Geo,Giv,Sur, NameType=Geo,Oth, NameType=Geo,Pro, NameType=Geo,Sur, NameType=Giv, NameType=Giv,Nat, NameType=Giv,Oth, NameType=Giv,Pro, NameType=Giv,Pro,Sur, NameType=Giv,Sur, NameType=Nat, NameType=Nat,Sur, NameType=Oth, NameType=Pro, NameType=Pro,Sur, NameType=Sur, Number=Plur, Number=Sing, Polarity=Pos, Style=Arch, Style=Coll, Style=Expr, Style=Rare

PROPN occurs with 613 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|NameType=Sur|Number=Sing|Polarity=Pos (14098 tokens). Examples: Klaus, Havel, Svoboda, Mečiar, Jelcin, John, Zeman, Němec, Novák, Benda

Relations

PROPN nodes are attached to their parents using 28 different relations: nmod (24792; 30% instances), flat (20529; 24% instances), nsubj (10925; 13% instances), conj (7216; 9% instances), obl (6323; 8% instances), root (5197; 6% instances), dep (2823; 3% instances), obj (1719; 2% instances), flat:foreign (1235; 1% instances), appos (1217; 1% instances), obl:arg (582; 1% instances), iobj (474; 1% instances), orphan (450; 1% instances), nsubj:pass (276; 0% instances), advcl (136; 0% instances), xcomp (29; 0% instances), amod (24; 0% instances), obl:agent (24; 0% instances), cc (18; 0% instances), vocative (15; 0% instances), ccomp (8; 0% instances), case (6; 0% instances), acl (4; 0% instances), parataxis (3; 0% instances), advmod (2; 0% instances), csubj (2; 0% instances), csubj:pass (1; 0% instances), punct (1; 0% instances)

Parents of PROPN nodes belong to 15 different parts of speech: NOUN (36011; 43% instances), PROPN (20466; 24% instances), VERB (17899; 21% instances), (5197; 6% instances), ADJ (3333; 4% instances), NUM (365; 0% instances), ADV (349; 0% instances), ADP (173; 0% instances), DET (99; 0% instances), PRON (91; 0% instances), PART (23; 0% instances), CCONJ (15; 0% instances), SYM (7; 0% instances), PUNCT (2; 0% instances), INTJ (1; 0% instances)

40884 (49%) PROPN nodes are leaves.

26077 (31%) PROPN nodes have one child.

9694 (12%) PROPN nodes have two children.

7376 (9%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 29.

Children of PROPN nodes are attached using 31 different relations: punct (18229; 24% instances), case (15507; 21% instances), flat (9336; 13% instances), conj (7712; 10% instances), nmod (5565; 7% instances), amod (5005; 7% instances), cc (3282; 4% instances), dep (3006; 4% instances), nummod (1539; 2% instances), appos (1166; 2% instances), acl (1117; 1% instances), advmod:emph (1037; 1% instances), orphan (473; 1% instances), flat:foreign (470; 1% instances), xcomp (339; 0% instances), mark (291; 0% instances), det (100; 0% instances), advmod (81; 0% instances), parataxis (72; 0% instances), cop (61; 0% instances), obl (61; 0% instances), nsubj (53; 0% instances), nummod:gov (40; 0% instances), advcl (9; 0% instances), obj (6; 0% instances), det:numgov (5; 0% instances), det:nummod (3; 0% instances), aux (1; 0% instances), ccomp (1; 0% instances), expl:pv (1; 0% instances), obl:arg (1; 0% instances)

Children of PROPN nodes belong to 16 different parts of speech: PROPN (20466; 27% instances), PUNCT (18231; 24% instances), ADP (15513; 21% instances), NOUN (5640; 8% instances), ADJ (5612; 8% instances), CCONJ (3525; 5% instances), NUM (2423; 3% instances), VERB (1385; 2% instances), ADV (947; 1% instances), SCONJ (299; 0% instances), DET (226; 0% instances), PART (162; 0% instances), AUX (62; 0% instances), PRON (45; 0% instances), SYM (24; 0% instances), INTJ (9; 0% instances)