Treebank Statistics: UD_Czech-PDTC: POS Tags: PROPN
There are 21518 PROPN lemmas (24%), 31692 PROPN types (16%) and 130646 PROPN tokens (4%).
Out of 17 observed tags, the rank of PROPN is: 2 in number of lemmas, 4 in number of types and 9 in number of tokens.
The 10 most frequent PROPN lemmas: Praha, ČR, Evropa, USA, Německo, Plzeň, Brno, Jiří, Jan, LN
The 10 most frequent PROPN types: Praha, Praze, ČR, USA, LN, ODS, Prahy, Yorku, Jiří, Evropě
The 10 most frequent ambiguous lemmas: York (PROPN 492, X 298), John (PROPN 463, X 11), Robert (PROPN 360, X 8), Washington (PROPN 340, X 28), David (PROPN 305, X 5), James (PROPN 241, X 25), Izrael (PROPN 233, X 1), Martin (PROPN 224, X 5), Ivan (PROPN 221, X 4), Dow (PROPN 206, X 134)
The 10 most frequent ambiguous types: Plzni (PROPN 375, NOUN 6), John (PROPN 361, X 11), Robert (PROPN 271, X 8), David (PROPN 213, X 5), Plzně (PROPN 210, NOUN 3), Německo (PROPN 208, ADJ 2), Dow (PROPN 206, X 134), Ford (PROPN 191, X 63), Japonsko (PROPN 188, ADJ 2), Petra (PROPN 179, NOUN 6)
- Plzni
- John
- Robert
- David
- Plzně
- Německo
- Dow
- Ford
- Japonsko
- Petra
Morphology
The form / lemma ratio of PROPN is 1.472813 (the average of all parts of speech is 2.169184).
The 1st highest number of forms (11) was observed with the lemma “Martin”: MARTIN, MARTINA, Martin, Martina, Martine, Martinem, Martinovi, Martinu, Martiny, Martině, Martinů.
The 2nd highest number of forms (11) was observed with the lemma “Němec”: NĚMCI, NĚMCŮ, NĚMEC, Němce, Němcem, Němci, Němcovi, Němcích, Němců, Němcům, Němec.
The 3rd highest number of forms (11) was observed with the lemma “Čech”: ČECH, ČEŠI, Čech, Čecha, Čechem, Čechovi, Čechy, Čechů, Čechům, Češi, Češích.
PROPN occurs with 8 features: NameType (130644; 100% instances), Gender (121319; 93% instances), Case (117195; 90% instances), Number (117195; 90% instances), Animacy (79906; 61% instances), Abbr (7415; 6% instances), Style (286; 0% instances), Typo (80; 0% instances)
PROPN occurs with 30 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Gender=Fem, Gender=Masc, Gender=Neut, NameType=Geo, NameType=Geo,Giv, NameType=Geo,Giv,Oth, NameType=Geo,Nat, NameType=Geo,Oth, NameType=Giv, NameType=Giv,Nat, NameType=Giv,Oth, NameType=Nat, NameType=Oth, Number=Plur, Number=Sing, Style=Coll, Style=Expr, Style=Slng, Style=Vrnc, Typo=Yes
PROPN occurs with 317 feature combinations.
The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|NameType=Giv|Number=Sing (38354 tokens).
Examples: Jiří, Jan, John, Václav, Petr, Robert, Josef, Pavel, Karel, Bush
Relations
PROPN nodes are attached to their parents using 24 different relations: nmod (38712; 30% instances), flat (30769; 24% instances), nsubj (19690; 15% instances), obl (12399; 9% instances), conj (10483; 8% instances), root (6823; 5% instances), obj (2702; 2% instances), obl:arg (2145; 2% instances), parataxis (2114; 2% instances), appos (1513; 1% instances), dep (924; 1% instances), nsubj:pass (696; 1% instances), orphan (452; 0% instances), advcl (310; 0% instances), iobj (270; 0% instances), amod (178; 0% instances), vocative (132; 0% instances), ccomp (130; 0% instances), acl:relcl (71; 0% instances), advcl:pred (47; 0% instances), acl (32; 0% instances), xcomp (31; 0% instances), csubj (18; 0% instances), csubj:pass (5; 0% instances)
Parents of PROPN nodes belong to 16 different parts of speech: NOUN (57404; 44% instances), VERB (32593; 25% instances), PROPN (26450; 20% instances), (6823; 5% instances), ADJ (3745; 3% instances), ADV (1308; 1% instances), X (825; 1% instances), NUM (605; 0% instances), PRON (299; 0% instances), DET (256; 0% instances), AUX (144; 0% instances), PART (124; 0% instances), SYM (32; 0% instances), CCONJ (27; 0% instances), INTJ (6; 0% instances), ADP (5; 0% instances)
62223 (48%) PROPN nodes are leaves.
39294 (30%) PROPN nodes have one child.
18315 (14%) PROPN nodes have two children.
10814 (8%) PROPN nodes have three or more children.
The highest child degree of a PROPN node is 29.
Children of PROPN nodes are attached using 33 different relations: case (30331; 26% instances), punct (20781; 18% instances), flat (13655; 12% instances), conj (11378; 10% instances), nmod (10892; 9% instances), amod (6170; 5% instances), cc (4940; 4% instances), appos (4228; 4% instances), parataxis (2370; 2% instances), acl:relcl (2218; 2% instances), cop (2176; 2% instances), advmod:emph (2122; 2% instances), nsubj (1516; 1% instances), nummod (1354; 1% instances), dep (1014; 1% instances), mark (779; 1% instances), det (448; 0% instances), advmod (426; 0% instances), obl (417; 0% instances), orphan (385; 0% instances), aux (343; 0% instances), nummod:gov (104; 0% instances), advcl (87; 0% instances), acl (33; 0% instances), obl:arg (22; 0% instances), discourse (21; 0% instances), det:numgov (19; 0% instances), advcl:pred (15; 0% instances), csubj (7; 0% instances), obj (6; 0% instances), det:nummod (4; 0% instances), ccomp (2; 0% instances), vocative (2; 0% instances)
Children of PROPN nodes belong to 17 different parts of speech: ADP (30209; 26% instances), PROPN (26450; 22% instances), PUNCT (20781; 18% instances), NOUN (14163; 12% instances), ADJ (6820; 6% instances), CCONJ (5102; 4% instances), AUX (2540; 2% instances), VERB (2516; 2% instances), NUM (2416; 2% instances), X (1946; 2% instances), PART (1694; 1% instances), DET (1359; 1% instances), ADV (1320; 1% instances), SCONJ (713; 1% instances), PRON (208; 0% instances), SYM (22; 0% instances), INTJ (6; 0% instances)