home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_English-EWT: POS Tags: PROPN

There are 4834 PROPN lemmas (26%), 4995 PROPN types (22%) and 16115 PROPN tokens (6%). Out of 17 observed tags, the rank of PROPN is: 2 in number of lemmas, 2 in number of types and 8 in number of tokens.

The 10 most frequent PROPN lemmas: Bush, US, al, Iraq, enron, State, Iran, China, September, Qaeda

The 10 most frequent PROPN types: bush, US, al, Iraq, enron, Iran, China, states, Qaeda, John

The 10 most frequent ambiguous lemmas: al (PROPN 67, X 1), enron (PROPN 7, NOUN 5), president (NOUN 30, PROPN 1), American (ADJ 88, PROPN 54), google (VERB 8, PROPN 2), mark (NOUN 13, VERB 12, PROPN 2, X 1), north (NOUN 6, ADV 5, ADJ 2, PROPN 2), god (PROPN 5, NOUN 4), war (NOUN 74, PROPN 2), world (NOUN 140, PROPN 2)

The 10 most frequent ambiguous types: al (PROPN 67, X 1), states (NOUN 10, PROPN 6, VERB 5), John (PROPN 75, X 4), president (NOUN 24, PROPN 1), may (AUX 221, PROPN 1), google (PROPN 3, VERB 2), Vince (PROPN 45, X 1), mark (NOUN 10, VERB 6, PROPN 2), Paul (PROPN 35, X 1), north (NOUN 6, ADV 5, ADJ 2, PROPN 2)

Morphology

The form / lemma ratio of PROPN is 1.033306 (the average of all parts of speech is 1.228673).

The 1st highest number of forms (4) was observed with the lemma “Friday”: Fri, Fri., Fridays, friday.

The 2nd highest number of forms (4) was observed with the lemma “March”: MARCH, Mar, March, Marches.

The 3rd highest number of forms (4) was observed with the lemma “McDonald”: Mc.Donald, McDonal, McDonald, mc.

PROPN occurs with 4 features: Number (16114; 100% instances), Abbr (117; 1% instances), Typo (27; 0% instances), Style (2; 0% instances)

PROPN occurs with 5 feature-value pairs: Abbr=Yes, Number=Plur, Number=Sing, Style=Expr, Typo=Yes

PROPN occurs with 8 feature combinations. The most frequent feature combination is Number=Sing (15245 tokens). Examples: bush, US, al, Iraq, enron, Iran, China, Qaeda, John, india

Relations

PROPN nodes are attached to their parents using 27 different relations: compound (3401; 21% instances), nsubj (2015; 13% instances), nmod (1979; 12% instances), flat (1849; 11% instances), obl (1811; 11% instances), root (1418; 9% instances), conj (1019; 6% instances), obj (673; 4% instances), appos (640; 4% instances), nmod:poss (482; 3% instances), list (200; 1% instances), vocative (131; 1% instances), nsubj:pass (114; 1% instances), xcomp (65; 0% instances), obl:tmod (64; 0% instances), parataxis (63; 0% instances), nmod:tmod (38; 0% instances), ccomp (32; 0% instances), advcl (27; 0% instances), iobj (26; 0% instances), nmod:npmod (25; 0% instances), obl:npmod (23; 0% instances), acl:relcl (8; 0% instances), acl (5; 0% instances), discourse (3; 0% instances), csubj (2; 0% instances), reparandum (2; 0% instances)

Parents of PROPN nodes belong to 14 different parts of speech: PROPN (5845; 36% instances), VERB (4260; 26% instances), NOUN (3946; 24% instances), (1418; 9% instances), ADJ (365; 2% instances), ADV (105; 1% instances), PRON (80; 0% instances), NUM (45; 0% instances), INTJ (18; 0% instances), SYM (14; 0% instances), AUX (8; 0% instances), DET (7; 0% instances), X (3; 0% instances), ADP (1; 0% instances)

6665 (41%) PROPN nodes are leaves.

4559 (28%) PROPN nodes have one child.

2439 (15%) PROPN nodes have two children.

2452 (15%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 18.

Children of PROPN nodes are attached using 35 different relations: case (4545; 23% instances), punct (2661; 14% instances), compound (2224; 11% instances), flat (1902; 10% instances), det (1463; 8% instances), conj (1170; 6% instances), amod (1133; 6% instances), cc (731; 4% instances), appos (630; 3% instances), nummod (589; 3% instances), list (573; 3% instances), nmod (516; 3% instances), cop (190; 1% instances), nsubj (180; 1% instances), advmod (144; 1% instances), acl:relcl (131; 1% instances), parataxis (126; 1% instances), nmod:poss (117; 1% instances), nmod:tmod (55; 0% instances), acl (50; 0% instances), mark (47; 0% instances), discourse (35; 0% instances), cc:preconj (33; 0% instances), aux (30; 0% instances), obl (24; 0% instances), nmod:npmod (20; 0% instances), advcl (11; 0% instances), expl (4; 0% instances), obl:tmod (4; 0% instances), vocative (4; 0% instances), orphan (3; 0% instances), reparandum (3; 0% instances), goeswith (2; 0% instances), det:predet (1; 0% instances), obl:npmod (1; 0% instances)

Children of PROPN nodes belong to 17 different parts of speech: PROPN (5845; 30% instances), ADP (3939; 20% instances), PUNCT (2661; 14% instances), DET (1469; 8% instances), ADJ (1138; 6% instances), NOUN (1042; 5% instances), NUM (857; 4% instances), CCONJ (727; 4% instances), PART (584; 3% instances), VERB (323; 2% instances), AUX (221; 1% instances), PRON (174; 1% instances), ADV (140; 1% instances), X (106; 1% instances), SYM (60; 0% instances), SCONJ (40; 0% instances), INTJ (26; 0% instances)