home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-PDT: POS Tags: PROPN

There are 4725 PROPN lemmas (17%), 6531 PROPN types (12%) and 15741 PROPN tokens (5%). Out of 17 observed tags, the rank of PROPN is: 3 in number of lemmas, 4 in number of types and 7 in number of tokens.

The 10 most frequent PROPN lemmas: Praha, ČR, Německo, ODS, Evropa, LN, Jan, Jiří, Brno, Slovensko

The 10 most frequent PROPN types: Praha, ČR, ODS, Praze, LN, USA, Jiří, Jan, OSN, Václav

The 10 most frequent ambiguous lemmas: Washington (PROPN 24, X 1), Fischer (PROPN 16, X 1), York (X 20, PROPN 16), Bohemia (PROPN 15, X 2), Brod (PROPN 9, X 1), Panton (PROPN 9, X 1), Benetton (PROPN 8, X 1), Inkatha (PROPN 8, X 1), Albert (PROPN 7, X 1), Ford (PROPN 7, X 1)

The 10 most frequent ambiguous types: Plzeň (PROPN 22, NOUN 2), Nováček (PROPN 15, NOUN 1), Maďarsko (PROPN 14, ADJ 1), Bohemia (PROPN 13, X 2), C (NOUN 23, PROPN 12), Fischer (PROPN 11, X 1), Plzni (PROPN 9, NOUN 1), Škoda (PROPN 9, NOUN 4), Albert (PROPN 7, X 1), Benetton (PROPN 6, X 1)

Morphology

The form / lemma ratio of PROPN is 1.382222 (the average of all parts of speech is 1.964432).

The 1st highest number of forms (8) was observed with the lemma “Američan”: Američan, Američana, Američanem, Američani, Američany, Američané, Američanů, Američanům.

The 2nd highest number of forms (8) was observed with the lemma “Čech”: Čech, Čecha, Čechem, Čechy, Čechů, Čechům, Češi, Češích.

The 3rd highest number of forms (7) was observed with the lemma “Kanada”: KANADA, Kan, Kanada, Kanadou, Kanadu, Kanady, Kanadě.

PROPN occurs with 9 features: NameType (15741; 100% instances), Polarity (15741; 100% instances), Gender (14282; 91% instances), Case (13840; 88% instances), Number (13840; 88% instances), Animacy (9109; 58% instances), Abbr (1457; 9% instances), Typo (14; 0% instances), Style (12; 0% instances)

PROPN occurs with 27 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Gender=Fem, Gender=Masc, Gender=Neut, NameType=Geo, NameType=Geo,Giv, NameType=Geo,Giv,Oth, NameType=Geo,Oth, NameType=Giv, NameType=Giv,Nat, NameType=Giv,Oth, NameType=Nat, NameType=Oth, Number=Plur, Number=Sing, Polarity=Pos, Style=Coll, Typo=Yes

PROPN occurs with 169 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|NameType=Giv|Number=Sing|Polarity=Pos (4546 tokens). Examples: Jiří, Jan, Václav, Vladimír, Klaus, Petr, Pavel, Josef, John, Havel

Relations

PROPN nodes are attached to their parents using 22 different relations: nmod (4423; 28% instances), flat (4004; 25% instances), nsubj (2089; 13% instances), conj (1457; 9% instances), obl (1262; 8% instances), root (1041; 7% instances), dep (576; 4% instances), obl:arg (257; 2% instances), obj (211; 1% instances), appos (177; 1% instances), orphan (101; 1% instances), nsubj:pass (48; 0% instances), iobj (47; 0% instances), advcl (20; 0% instances), xcomp (8; 0% instances), ccomp (6; 0% instances), vocative (5; 0% instances), acl (3; 0% instances), acl:relcl (3; 0% instances), amod (1; 0% instances), csubj (1; 0% instances), parataxis (1; 0% instances)

Parents of PROPN nodes belong to 15 different parts of speech: NOUN (6875; 44% instances), PROPN (3712; 24% instances), VERB (3391; 22% instances), (1041; 7% instances), ADJ (424; 3% instances), NUM (82; 1% instances), ADV (76; 0% instances), X (69; 0% instances), DET (38; 0% instances), PRON (16; 0% instances), AUX (8; 0% instances), PART (5; 0% instances), ADP (2; 0% instances), CCONJ (1; 0% instances), SYM (1; 0% instances)

7462 (47%) PROPN nodes are leaves.

5062 (32%) PROPN nodes have one child.

1802 (11%) PROPN nodes have two children.

1415 (9%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 29.

Children of PROPN nodes are attached using 29 different relations: case (3218; 22% instances), punct (3058; 21% instances), flat (1742; 12% instances), conj (1572; 11% instances), nmod (1289; 9% instances), amod (802; 6% instances), cc (716; 5% instances), dep (589; 4% instances), appos (249; 2% instances), advmod:emph (237; 2% instances), acl:relcl (232; 2% instances), nummod (179; 1% instances), orphan (83; 1% instances), xcomp (75; 1% instances), cop (61; 0% instances), mark (53; 0% instances), nsubj (52; 0% instances), obl (30; 0% instances), advmod (22; 0% instances), parataxis (21; 0% instances), det (18; 0% instances), acl (7; 0% instances), nummod:gov (5; 0% instances), advcl (3; 0% instances), aux (2; 0% instances), det:numgov (2; 0% instances), ccomp (1; 0% instances), csubj (1; 0% instances), expl:pv (1; 0% instances)

Children of PROPN nodes belong to 16 different parts of speech: PROPN (3712; 26% instances), ADP (3200; 22% instances), PUNCT (3058; 21% instances), NOUN (1457; 10% instances), ADJ (867; 6% instances), CCONJ (775; 5% instances), NUM (364; 3% instances), VERB (310; 2% instances), ADV (160; 1% instances), X (148; 1% instances), PART (89; 1% instances), AUX (64; 0% instances), SCONJ (53; 0% instances), DET (48; 0% instances), PRON (9; 0% instances), SYM (6; 0% instances)