home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-PDT: POS Tags: PROPN

There are 15241 PROPN lemmas (25%), 21954 PROPN types (17%) and 84031 PROPN tokens (6%). Out of 17 observed tags, the rank of PROPN is: 2 in number of lemmas, 3 in number of types and 6 in number of tokens.

The 10 most frequent PROPN lemmas: Praha, ČR, Evropa, LN, Jan, Jiří, Německo, Brno, ODS, USA

The 10 most frequent PROPN types: Praha, ČR, Praze, LN, ODS, USA, J, Jiří, Jan, OSN

The 10 most frequent ambiguous lemmas: J (PROPN 422, ADJ 30), M (PROPN 244, NOUN 8, ADJ 1), V (PROPN 210, NUM 23, NOUN 7, ADJ 5), A (PROPN 172, ADJ 8, NOUN 8), York (PROPN 165, ADJ 5), P (PROPN 136, ADJ 4, NOUN 2), čt (NOUN 4, PROPN 2), S (PROPN 116, ADJ 12, NOUN 2), Washington (PROPN 111, ADJ 1), r (NOUN 55, ADV 1, PROPN 1)

The 10 most frequent ambiguous types: J (PROPN 422, ADJ 30, NOUN 3), M (PROPN 244, NOUN 51, X 3, ADJ 1), V (ADP 3736, PROPN 210, NUM 23, NOUN 15, ADJ 6, ADV 2), A (CCONJ 1042, PROPN 172, NOUN 93, ADJ 19, X 4), Rusko (PROPN 163, ADJ 3), Německo (PROPN 144, ADJ 2), P (PROPN 136, NOUN 124, ADJ 17, ADP 1), S (ADP 470, PROPN 117, NOUN 38, ADJ 14, X 3), r (NOUN 433, ADV 1, PROPN 1), F (PROPN 99, NOUN 27, ADJ 10)

Morphology

The form / lemma ratio of PROPN is 1.440457 (the average of all parts of speech is 2.181849).

The 1st highest number of forms (11) was observed with the lemma “Čech”: ČECH, ČEŠI, Čech, Čecha, Čechem, Čechovi, Čechy, Čechů, Čechům, Češi, Češích.

The 2nd highest number of forms (10) was observed with the lemma “Jan”: JAN, JANA, Jan, Jana, Janem, Janovi, Janové, Janu, Jany, Janů.

The 3rd highest number of forms (10) was observed with the lemma “Němec”: NĚMCI, NĚMCŮ, NĚMEC, Němce, Němcem, Němci, Němcích, Němců, Němcům, Němec.

PROPN occurs with 9 features: NameType (84031; 100% instances), Polarity (84031; 100% instances), Gender (82083; 98% instances), Number (68761; 82% instances), Case (66478; 79% instances), Animacy (48949; 58% instances), Abbr (13042; 16% instances), Foreign (3684; 4% instances), Style (155; 0% instances)

PROPN occurs with 46 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, NameType=Com, NameType=Com,Geo, NameType=Com,Giv, NameType=Com,Giv,Sur, NameType=Com,Nat, NameType=Com,Pro, NameType=Com,Sur, NameType=Geo, NameType=Geo,Giv, NameType=Geo,Giv,Sur, NameType=Geo,Oth, NameType=Geo,Pro, NameType=Geo,Sur, NameType=Giv, NameType=Giv,Nat, NameType=Giv,Oth, NameType=Giv,Pro, NameType=Giv,Pro,Sur, NameType=Giv,Sur, NameType=Nat, NameType=Nat,Sur, NameType=Oth, NameType=Pro, NameType=Pro,Sur, NameType=Sur, Number=Plur, Number=Sing, Polarity=Pos, Style=Arch, Style=Coll, Style=Expr, Style=Rare

PROPN occurs with 613 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|NameType=Sur|Number=Sing|Polarity=Pos (14098 tokens). Examples: Klaus, Havel, Svoboda, Mečiar, Jelcin, John, Zeman, Němec, Novák, Benda

Relations

PROPN nodes are attached to their parents using 28 different relations: nmod (26311; 31% instances), nsubj (14576; 17% instances), flat (13568; 16% instances), conj (7613; 9% instances), obl (6709; 8% instances), root (5339; 6% instances), dep (2846; 3% instances), obj (2041; 2% instances), appos (1326; 2% instances), flat:foreign (1256; 1% instances), obl:arg (745; 1% instances), iobj (557; 1% instances), orphan (484; 1% instances), nsubj:pass (350; 0% instances), advcl (154; 0% instances), obl:agent (33; 0% instances), xcomp (32; 0% instances), amod (24; 0% instances), vocative (21; 0% instances), cc (18; 0% instances), ccomp (8; 0% instances), case (6; 0% instances), acl (5; 0% instances), parataxis (3; 0% instances), advmod (2; 0% instances), csubj (2; 0% instances), csubj:pass (1; 0% instances), punct (1; 0% instances)

Parents of PROPN nodes belong to 15 different parts of speech: PROPN (25837; 31% instances), NOUN (25811; 31% instances), VERB (22193; 26% instances), (5339; 6% instances), ADJ (3633; 4% instances), NUM (390; 0% instances), ADV (368; 0% instances), ADP (198; 0% instances), DET (110; 0% instances), PRON (103; 0% instances), PART (23; 0% instances), CCONJ (15; 0% instances), SYM (7; 0% instances), INTJ (2; 0% instances), PUNCT (2; 0% instances)

34622 (41%) PROPN nodes are leaves.

26954 (32%) PROPN nodes have one child.

12634 (15%) PROPN nodes have two children.

9821 (12%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 29.

Children of PROPN nodes are attached using 31 different relations: punct (19174; 21% instances), case (16276; 18% instances), flat (13670; 15% instances), nmod (12946; 14% instances), conj (8101; 9% instances), amod (5142; 6% instances), cc (3600; 4% instances), dep (3354; 4% instances), nummod (1593; 2% instances), acl (1552; 2% instances), appos (1280; 1% instances), advmod:emph (1224; 1% instances), orphan (488; 1% instances), flat:foreign (470; 1% instances), xcomp (391; 0% instances), mark (320; 0% instances), det (110; 0% instances), advmod (83; 0% instances), parataxis (80; 0% instances), cop (67; 0% instances), obl (67; 0% instances), nsubj (62; 0% instances), nummod:gov (40; 0% instances), advcl (9; 0% instances), obj (8; 0% instances), det:numgov (5; 0% instances), aux (3; 0% instances), det:nummod (3; 0% instances), ccomp (2; 0% instances), obl:arg (2; 0% instances), expl:pv (1; 0% instances)

Children of PROPN nodes belong to 16 different parts of speech: PROPN (25837; 29% instances), PUNCT (19176; 21% instances), ADP (16291; 18% instances), NOUN (12709; 14% instances), ADJ (5805; 6% instances), CCONJ (3935; 4% instances), NUM (2550; 3% instances), VERB (1833; 2% instances), ADV (1081; 1% instances), SCONJ (330; 0% instances), DET (252; 0% instances), PART (172; 0% instances), AUX (70; 0% instances), PRON (49; 0% instances), SYM (24; 0% instances), INTJ (9; 0% instances)