home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_English-GUM: POS Tags: NUM

There are 704 NUM lemmas (4%), 713 NUM types (3%) and 4274 NUM tokens (2%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 13 in number of tokens.

The 10 most frequent NUM lemmas: one, two, 1, three, 2, 3, four, five, 4, 10

The 10 most frequent NUM types: one, two, 1, three, 2, 3, four, five, 4, 10

The 10 most frequent ambiguous lemmas: one (NUM 457, NOUN 134, PRON 49), two (NUM 319, NOUN 1), 1 (NUM 150, X 7), 2 (NUM 120, X 5), 3 (NUM 79, X 5), four (NUM 70, NOUN 1), five (NUM 69, NOUN 1), 4 (NUM 62, X 4), 10 (NUM 59, X 1), 6 (NUM 55, X 1)

The 10 most frequent ambiguous types: one (NUM 392, NOUN 112, PRON 48), 1 (NUM 151, X 7), 2 (NUM 120, X 5), 3 (NUM 79, X 5), five (NUM 63, NOUN 1), 4 (NUM 62, X 4), 10 (NUM 59, X 1), 6 (NUM 55, X 1), 5 (NUM 53, X 2), 7 (NUM 42, X 1)

Morphology

The form / lemma ratio of NUM is 1.012784 (the average of all parts of speech is 1.243967).

The 1st highest number of forms (2) was observed with the lemma “1000”: 1,000, 1000.

The 2nd highest number of forms (2) was observed with the lemma “1400”: 1, 1,400.

The 3rd highest number of forms (2) was observed with the lemma “1970”: 19, 1970.

NUM occurs with 4 features: NumForm (4272; 100% instances), NumType (4272; 100% instances), Typo (7; 0% instances), Number (4; 0% instances)

NUM occurs with 7 feature-value pairs: NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, Number=Sing, Typo=Yes

NUM occurs with 9 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (2646 tokens). Examples: 1, 2, 3, 4, 10, 6, 20, 5, 15, 7

Relations

NUM nodes are attached to their parents using 30 different relations: nummod (1444; 34% instances), obl (404; 9% instances), parataxis (395; 9% instances), nmod:unmarked (363; 8% instances), flat (315; 7% instances), nmod (287; 7% instances), conj (281; 7% instances), compound (220; 5% instances), root (148; 3% instances), obj (81; 2% instances), appos (78; 2% instances), nsubj (57; 1% instances), obl:unmarked (53; 1% instances), list (30; 1% instances), xcomp (25; 1% instances), dep (19; 0% instances), advcl (13; 0% instances), ccomp (12; 0% instances), orphan (10; 0% instances), nsubj:pass (8; 0% instances), nsubj:outer (7; 0% instances), dislocated (6; 0% instances), nmod:poss (4; 0% instances), acl:relcl (3; 0% instances), amod (3; 0% instances), reparandum (3; 0% instances), advcl:relcl (2; 0% instances), acl (1; 0% instances), discourse (1; 0% instances), obl:agent (1; 0% instances)

Parents of NUM nodes belong to 14 different parts of speech: NOUN (1759; 41% instances), VERB (840; 20% instances), NUM (683; 16% instances), PROPN (621; 15% instances), (148; 3% instances), SYM (125; 3% instances), ADJ (65; 2% instances), ADV (14; 0% instances), X (6; 0% instances), AUX (5; 0% instances), INTJ (3; 0% instances), PRON (3; 0% instances), CCONJ (1; 0% instances), DET (1; 0% instances)

1658 (39%) NUM nodes are leaves.

1386 (32%) NUM nodes have one child.

716 (17%) NUM nodes have two children.

514 (12%) NUM nodes have three or more children.

The highest child degree of a NUM node is 13.

Children of NUM nodes are attached using 34 different relations: punct (1909; 40% instances), case (724; 15% instances), nmod (353; 7% instances), conj (297; 6% instances), advmod (291; 6% instances), nmod:unmarked (274; 6% instances), compound (248; 5% instances), cc (116; 2% instances), nsubj (115; 2% instances), cop (104; 2% instances), det (57; 1% instances), discourse (34; 1% instances), parataxis (32; 1% instances), acl:relcl (26; 1% instances), mark (21; 0% instances), flat (20; 0% instances), amod (19; 0% instances), appos (13; 0% instances), advcl (11; 0% instances), reparandum (11; 0% instances), acl (9; 0% instances), obl (6; 0% instances), aux (4; 0% instances), obl:unmarked (4; 0% instances), nummod (3; 0% instances), csubj (2; 0% instances), dep (2; 0% instances), det:predet (2; 0% instances), goeswith (2; 0% instances), list (2; 0% instances), cc:preconj (1; 0% instances), dislocated (1; 0% instances), nmod:desc (1; 0% instances), nsubj:outer (1; 0% instances)

Children of NUM nodes belong to 17 different parts of speech: PUNCT (1909; 40% instances), NUM (683; 14% instances), ADP (637; 14% instances), ADV (280; 6% instances), NOUN (274; 6% instances), PROPN (246; 5% instances), CCONJ (114; 2% instances), AUX (109; 2% instances), SYM (104; 2% instances), PRON (92; 2% instances), ADJ (78; 2% instances), DET (60; 1% instances), VERB (59; 1% instances), INTJ (36; 1% instances), PART (16; 0% instances), SCONJ (15; 0% instances), X (3; 0% instances)