home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-PDT: POS Tags: PROPN

There are 15240 PROPN lemmas (25%), 21953 PROPN types (17%) and 84032 PROPN tokens (6%). Out of 17 observed tags, the rank of PROPN is: 2 in number of lemmas, 3 in number of types and 6 in number of tokens.

The 10 most frequent PROPN lemmas: Praha, ČR, Evropa, LN, Jan, Jiří, Německo, Brno, ODS, USA

The 10 most frequent PROPN types: Praha, ČR, Praze, LN, ODS, USA, J, Jiří, Jan, OSN

The 10 most frequent ambiguous lemmas: J (PROPN 422, ADJ 30, NOUN 2), M (PROPN 244, NOUN 8, ADJ 1), V (PROPN 210, NUM 23, NOUN 10, ADJ 5), A (PROPN 172, ADJ 8, NOUN 8), York (PROPN 165, ADJ 5), P (PROPN 136, ADJ 4, NOUN 2), čt (NOUN 4, PROPN 2), S (PROPN 116, ADJ 12, NOUN 2), Washington (PROPN 111, ADJ 1), r (NOUN 55, ADV 1, PROPN 1)

The 10 most frequent ambiguous types: J (PROPN 422, ADJ 30, NOUN 3), M (PROPN 244, NOUN 51, X 3, ADJ 1), V (ADP 3736, PROPN 210, NUM 23, NOUN 15, ADJ 6, ADV 2), A (CCONJ 1042, PROPN 172, NOUN 93, ADJ 19, X 4), Rusko (PROPN 163, ADJ 3), Německo (PROPN 144, ADJ 2), P (PROPN 136, NOUN 124, ADJ 17, ADP 1), S (ADP 470, PROPN 117, NOUN 38, ADJ 14, X 3), r (NOUN 433, ADV 1, PROPN 1), F (PROPN 99, NOUN 27, ADJ 10)

Morphology

The form / lemma ratio of PROPN is 1.440486 (the average of all parts of speech is 2.181221).

The 1st highest number of forms (11) was observed with the lemma “Čech”: ČECH, ČEŠI, Čech, Čecha, Čechem, Čechovi, Čechy, Čechů, Čechům, Češi, Češích.

The 2nd highest number of forms (10) was observed with the lemma “Jan”: JAN, JANA, Jan, Jana, Janem, Janovi, Janové, Janu, Jany, Janů.

The 3rd highest number of forms (10) was observed with the lemma “Němec”: NĚMCI, NĚMCŮ, NĚMEC, Němce, Němcem, Němci, Němcích, Němců, Němcům, Němec.

PROPN occurs with 9 features: Polarity (84032; 100% instances), NameType (84030; 100% instances), Gender (82084; 98% instances), Number (68762; 82% instances), Case (66479; 79% instances), Animacy (48951; 58% instances), Abbr (13042; 16% instances), Foreign (3685; 4% instances), Style (155; 0% instances)

PROPN occurs with 46 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, NameType=Com, NameType=Com,Geo, NameType=Com,Giv, NameType=Com,Giv,Sur, NameType=Com,Nat, NameType=Com,Pro, NameType=Com,Sur, NameType=Geo, NameType=Geo,Giv, NameType=Geo,Giv,Sur, NameType=Geo,Oth, NameType=Geo,Pro, NameType=Geo,Sur, NameType=Giv, NameType=Giv,Nat, NameType=Giv,Oth, NameType=Giv,Pro, NameType=Giv,Pro,Sur, NameType=Giv,Sur, NameType=Nat, NameType=Nat,Sur, NameType=Oth, NameType=Pro, NameType=Pro,Sur, NameType=Sur, Number=Plur, Number=Sing, Polarity=Pos, Style=Arch, Style=Coll, Style=Expr, Style=Rare

PROPN occurs with 615 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|NameType=Sur|Number=Sing|Polarity=Pos (14098 tokens). Examples: Klaus, Havel, Svoboda, Mečiar, Jelcin, John, Zeman, Němec, Novák, Benda

Relations

PROPN nodes are attached to their parents using 24 different relations: nmod (24808; 30% instances), flat (20538; 24% instances), nsubj (10933; 13% instances), conj (7219; 9% instances), obl (6320; 8% instances), root (5197; 6% instances), dep (2820; 3% instances), obj (1281; 2% instances), obl:arg (1257; 1% instances), flat:foreign (1244; 1% instances), appos (1210; 1% instances), orphan (448; 1% instances), nsubj:pass (276; 0% instances), iobj (259; 0% instances), advcl (136; 0% instances), xcomp (29; 0% instances), amod (24; 0% instances), vocative (15; 0% instances), ccomp (8; 0% instances), acl:relcl (3; 0% instances), parataxis (3; 0% instances), csubj (2; 0% instances), acl (1; 0% instances), csubj:pass (1; 0% instances)

Parents of PROPN nodes belong to 15 different parts of speech: NOUN (36000; 43% instances), PROPN (20466; 24% instances), VERB (17351; 21% instances), (5197; 6% instances), ADJ (3340; 4% instances), AUX (554; 1% instances), NUM (369; 0% instances), ADV (346; 0% instances), ADP (173; 0% instances), DET (99; 0% instances), PRON (91; 0% instances), PART (23; 0% instances), CCONJ (15; 0% instances), SYM (7; 0% instances), INTJ (1; 0% instances)

40875 (49%) PROPN nodes are leaves.

26091 (31%) PROPN nodes have one child.

9697 (12%) PROPN nodes have two children.

7369 (9%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 29.

Children of PROPN nodes are attached using 31 different relations: punct (18207; 24% instances), case (15500; 21% instances), flat (9341; 13% instances), conj (7719; 10% instances), nmod (5566; 7% instances), amod (5007; 7% instances), cc (3240; 4% instances), dep (2994; 4% instances), nummod (1536; 2% instances), appos (1157; 2% instances), acl:relcl (1088; 1% instances), advmod:emph (1036; 1% instances), flat:foreign (487; 1% instances), orphan (473; 1% instances), xcomp (339; 0% instances), mark (281; 0% instances), det (108; 0% instances), parataxis (95; 0% instances), advmod (82; 0% instances), cop (61; 0% instances), obl (61; 0% instances), nsubj (53; 0% instances), nummod:gov (40; 0% instances), acl (29; 0% instances), advcl (9; 0% instances), det:numgov (5; 0% instances), aux (3; 0% instances), det:nummod (3; 0% instances), obl:arg (2; 0% instances), ccomp (1; 0% instances), expl:pv (1; 0% instances)

Children of PROPN nodes belong to 16 different parts of speech: PROPN (20466; 27% instances), PUNCT (18207; 24% instances), ADP (15513; 21% instances), NOUN (5643; 8% instances), ADJ (5606; 8% instances), CCONJ (3534; 5% instances), NUM (2409; 3% instances), VERB (1320; 2% instances), ADV (952; 1% instances), SCONJ (290; 0% instances), DET (223; 0% instances), PART (162; 0% instances), AUX (117; 0% instances), SYM (38; 0% instances), PRON (35; 0% instances), INTJ (9; 0% instances)