home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Polish-PDB: POS Tags: PROPN

There are 5835 PROPN lemmas (19%), 7030 PROPN types (11%) and 12000 PROPN tokens (3%). Out of 17 observed tags, the rank of PROPN is: 3 in number of lemmas, 4 in number of types and 7 in number of tokens.

The 10 most frequent PROPN lemmas: Polska, Europa, Warszawa, Andrzej, Polak, UE, Jerzy, Bóg, Jan, Piotr

The 10 most frequent PROPN types: Polsce, Polski, UE, Europy, Andrzej, Polska, Europie, Warszawie, Jerzy, SLD

The 10 most frequent ambiguous lemmas: A (PROPN 30, ADJ 1), Solidarność (NOUN 4, PROPN 3), The (PROPN 3, X 1), Blue (PROPN 2, ADJ 1), Czarny (ADJ 2, PROPN 2), KK (PROPN 2, ADV 1), Las (PROPN 2, X 1), PPE-DE (PROPN 2, ADV 1), Arabski (ADJ 1, PROPN 1), Celsjusz (NOUN 1, PROPN 1)

The 10 most frequent ambiguous types: Polski (PROPN 92, ADJ 4), Polska (PROPN 52, ADJ 13), A (CCONJ 153, PART 117, PROPN 30, ADJ 1, ADP 1, ADV 1, INTJ 1, NOUN 1, X 1), S (PROPN 29, NOUN 4, ADV 2), M (PROPN 24, AUX 3), b (ADJ 3, ADV 3, PROPN 2), c (NOUN 2, ADV 1, PROPN 1), Bóg (PROPN 17, NOUN 4), SA (PROPN 17, NOUN 1), Boga (PROPN 16, NOUN 3)

Morphology

The form / lemma ratio of PROPN is 1.204799 (the average of all parts of speech is 1.966055).

The 1st highest number of forms (7) was observed with the lemma “Niemiec”: Niemca, Niemcami, Niemcem, Niemcom, Niemcy, Niemców, Niemiec.

The 2nd highest number of forms (7) was observed with the lemma “Polak”: Polacy, Polak, Polaka, Polakami, Polakiem, Polakom, Polaków.

The 3rd highest number of forms (6) was observed with the lemma “Agnieszka”: AGNIESZKA, Agnieszce, Agnieszka, Agnieszki, Agnieszką, Agnieszkę.

PROPN occurs with 7 features: Case (11729; 98% instances), Gender (11729; 98% instances), Number (11729; 98% instances), Animacy (7283; 61% instances), Abbr (271; 2% instances), NumType (6; 0% instances), Polite (2; 0% instances)

PROPN occurs with 19 feature-value pairs: Abbr=Yes, Animacy=Hum, Animacy=Inan, Animacy=Nhum, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Gender=Fem, Gender=Masc, Gender=Neut, NumType=Sets, Number=Plur, Number=Ptan, Number=Sing, Polite=Depr

PROPN occurs with 72 feature combinations. The most frequent feature combination is Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing (3221 tokens). Examples: Andrzej, Jerzy, Marek, Piotr, Jan, Krzysztof, Jacek, Janusz, Józef, Tomasz

Relations

PROPN nodes are attached to their parents using 28 different relations: nmod (2448; 20% instances), nsubj (2436; 20% instances), flat (2171; 18% instances), appos (1259; 10% instances), obl (942; 8% instances), conj (843; 7% instances), nmod:arg (390; 3% instances), obj (325; 3% instances), obl:arg (235; 2% instances), iobj (206; 2% instances), root (205; 2% instances), nmod:poss (133; 1% instances), vocative (111; 1% instances), obl:agent (92; 1% instances), obl:cmpr (71; 1% instances), nsubj:pass (46; 0% instances), parataxis:insert (27; 0% instances), flat:foreign (23; 0% instances), parataxis:obj (9; 0% instances), advcl (6; 0% instances), xcomp:pred (6; 0% instances), list (4; 0% instances), orphan (4; 0% instances), acl:relcl (3; 0% instances), ccomp (2; 0% instances), ccomp:cleft (1; 0% instances), nmod:pred (1; 0% instances), xcomp (1; 0% instances)

Parents of PROPN nodes belong to 12 different parts of speech: NOUN (4541; 38% instances), VERB (3771; 31% instances), PROPN (2847; 24% instances), ADJ (445; 4% instances), (205; 2% instances), X (80; 1% instances), ADV (44; 0% instances), DET (32; 0% instances), PRON (31; 0% instances), INTJ (2; 0% instances), NUM (1; 0% instances), PART (1; 0% instances)

5662 (47%) PROPN nodes are leaves.

4109 (34%) PROPN nodes have one child.

1432 (12%) PROPN nodes have two children.

797 (7%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 14.

Children of PROPN nodes are attached using 39 different relations: case (2348; 24% instances), punct (2042; 20% instances), flat (2024; 20% instances), conj (849; 9% instances), cc (432; 4% instances), amod (402; 4% instances), appos (346; 3% instances), nmod (322; 3% instances), amod:flat (319; 3% instances), parataxis:obj (150; 2% instances), advmod:emph (126; 1% instances), acl:relcl (125; 1% instances), acl (112; 1% instances), mark (90; 1% instances), nmod:flat (35; 0% instances), det (31; 0% instances), cop (28; 0% instances), flat:foreign (23; 0% instances), parataxis:insert (21; 0% instances), det:poss (18; 0% instances), advmod (15; 0% instances), orphan (14; 0% instances), nsubj (13; 0% instances), advmod:neg (12; 0% instances), nummod:gov (12; 0% instances), cc:preconj (10; 0% instances), nmod:arg (10; 0% instances), det:numgov (8; 0% instances), advcl (4; 0% instances), discourse:intj (4; 0% instances), list (4; 0% instances), nmod:poss (4; 0% instances), nummod (4; 0% instances), obj (3; 0% instances), obl (3; 0% instances), obl:cmpr (3; 0% instances), aux (2; 0% instances), det:nummod (2; 0% instances), nummod:flat (1; 0% instances)

Children of PROPN nodes belong to 16 different parts of speech: PROPN (2847; 29% instances), ADP (2351; 24% instances), PUNCT (2042; 20% instances), ADJ (804; 8% instances), NOUN (717; 7% instances), CCONJ (436; 4% instances), VERB (255; 3% instances), X (125; 1% instances), PART (117; 1% instances), SCONJ (90; 1% instances), DET (63; 1% instances), ADV (49; 0% instances), AUX (30; 0% instances), PRON (22; 0% instances), NUM (19; 0% instances), INTJ (4; 0% instances)