home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Polish-PDB: POS Tags: PROPN

There are 5806 PROPN lemmas (19%), 6999 PROPN types (11%) and 11956 PROPN tokens (3%). Out of 17 observed tags, the rank of PROPN is: 3 in number of lemmas, 4 in number of types and 7 in number of tokens.

The 10 most frequent PROPN lemmas: Polska, Europa, Warszawa, Andrzej, Polak, UE, Jerzy, Bóg, Jan, Piotr

The 10 most frequent PROPN types: Polsce, Polski, UE, Europy, Andrzej, Polska, Europie, Warszawie, Jerzy, SLD

The 10 most frequent ambiguous lemmas: A (PROPN 30, ADJ 1), Solidarność (NOUN 4, PROPN 3), Street (PROPN 3, X 1), The (PROPN 3, X 1), Blue (PROPN 2, ADJ 1), Charles (PROPN 2, X 1), City (PROPN 2, X 1), Club (PROPN 2, X 1), Czarny (ADJ 2, PROPN 2), KK (PROPN 2, ADV 1)

The 10 most frequent ambiguous types: Polski (PROPN 92, ADJ 4), Polska (PROPN 52, ADJ 13), A (CCONJ 149, PART 121, PROPN 30, ADJ 1, ADP 1, ADV 1, INTJ 1, NOUN 1, X 1), S (PROPN 29, NOUN 4, X 2), M (PROPN 24, AUX 3), b (ADJ 3, ADV 3, PROPN 2), c (NOUN 2, ADV 1, PROPN 1), Bóg (PROPN 17, NOUN 4), SA (PROPN 17, NOUN 1), Boga (PROPN 16, NOUN 3)

Morphology

The form / lemma ratio of PROPN is 1.205477 (the average of all parts of speech is 1.965463).

The 1st highest number of forms (7) was observed with the lemma “Niemiec”: Niemca, Niemcami, Niemcem, Niemcom, Niemcy, Niemców, Niemiec.

The 2nd highest number of forms (7) was observed with the lemma “Polak”: Polacy, Polak, Polaka, Polakami, Polakiem, Polakom, Polaków.

The 3rd highest number of forms (6) was observed with the lemma “Agnieszka”: AGNIESZKA, Agnieszce, Agnieszka, Agnieszki, Agnieszką, Agnieszkę.

PROPN occurs with 7 features: Case (11685; 98% instances), Gender (11685; 98% instances), Number (11685; 98% instances), Animacy (7256; 61% instances), Abbr (271; 2% instances), NumType (6; 0% instances), Polite (2; 0% instances)

PROPN occurs with 19 feature-value pairs: Abbr=Yes, Animacy=Hum, Animacy=Inan, Animacy=Nhum, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Gender=Fem, Gender=Masc, Gender=Neut, NumType=Sets, Number=Plur, Number=Ptan, Number=Sing, Polite=Depr

PROPN occurs with 72 feature combinations. The most frequent feature combination is Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing (3220 tokens). Examples: Andrzej, Jerzy, Marek, Piotr, Jan, Krzysztof, Jacek, Janusz, Józef, Tomasz

Relations

PROPN nodes are attached to their parents using 31 different relations: nmod (2433; 20% instances), nsubj (2432; 20% instances), flat (2167; 18% instances), appos (1254; 10% instances), obl (937; 8% instances), conj (837; 7% instances), nmod:arg (446; 4% instances), obj (324; 3% instances), obl:arg (214; 2% instances), iobj (205; 2% instances), root (204; 2% instances), nmod:poss (156; 1% instances), vocative (111; 1% instances), obl:agent (56; 0% instances), nsubj:pass (46; 0% instances), acl:cmp (25; 0% instances), advcl:cmp (25; 0% instances), parataxis:insert (25; 0% instances), ccomp (17; 0% instances), parataxis:obj (9; 0% instances), advcl (6; 0% instances), xcomp:pred (6; 0% instances), obl:cmp (5; 0% instances), list (4; 0% instances), orphan (4; 0% instances), acl:relcl (3; 0% instances), ccomp:cleft (1; 0% instances), nmod:cmp (1; 0% instances), nmod:pred (1; 0% instances), parataxis (1; 0% instances), xcomp (1; 0% instances)

Parents of PROPN nodes belong to 12 different parts of speech: NOUN (4543; 38% instances), VERB (3770; 32% instances), PROPN (2832; 24% instances), ADJ (443; 4% instances), (204; 2% instances), X (69; 1% instances), ADV (32; 0% instances), PRON (32; 0% instances), DET (27; 0% instances), INTJ (2; 0% instances), NUM (1; 0% instances), PART (1; 0% instances)

5643 (47%) PROPN nodes are leaves.

4098 (34%) PROPN nodes have one child.

1481 (12%) PROPN nodes have two children.

734 (6%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 14.

Children of PROPN nodes are attached using 41 different relations: case (2342; 24% instances), flat (2023; 21% instances), punct (1897; 19% instances), conj (842; 9% instances), amod (509; 5% instances), cc (432; 4% instances), appos (345; 4% instances), nmod (320; 3% instances), amod:flat (319; 3% instances), parataxis:obj (152; 2% instances), advmod:emph (126; 1% instances), acl:relcl (125; 1% instances), mark (90; 1% instances), nmod:flat (35; 0% instances), det (31; 0% instances), cop (28; 0% instances), parataxis:insert (20; 0% instances), det:poss (18; 0% instances), advmod (15; 0% instances), nsubj (13; 0% instances), orphan (13; 0% instances), advmod:neg (12; 0% instances), nummod:gov (12; 0% instances), cc:preconj (10; 0% instances), nmod:arg (10; 0% instances), acl (8; 0% instances), det:numgov (8; 0% instances), advcl (4; 0% instances), discourse:intj (4; 0% instances), list (4; 0% instances), nmod:poss (4; 0% instances), nummod (4; 0% instances), parataxis (4; 0% instances), acl:cmp (3; 0% instances), obj (3; 0% instances), obl (3; 0% instances), aux (2; 0% instances), det:nummod (2; 0% instances), advmod:cmp (1; 0% instances), nummod:flat (1; 0% instances), obl:cmp (1; 0% instances)

Children of PROPN nodes belong to 16 different parts of speech: PROPN (2832; 29% instances), ADP (2345; 24% instances), PUNCT (1897; 19% instances), ADJ (804; 8% instances), NOUN (710; 7% instances), CCONJ (436; 4% instances), VERB (258; 3% instances), X (119; 1% instances), PART (117; 1% instances), SCONJ (90; 1% instances), DET (63; 1% instances), ADV (48; 0% instances), AUX (30; 0% instances), PRON (23; 0% instances), NUM (19; 0% instances), INTJ (4; 0% instances)