home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_English-GUM: POS Tags: NUM

There are 737 NUM lemmas (4%), 747 NUM types (3%) and 4626 NUM tokens (2%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 13 in number of tokens.

The 10 most frequent NUM lemmas: one, two, 1, three, 2, 3, five, four, 4, 10

The 10 most frequent NUM types: one, two, 1, three, 2, 3, five, four, 4, 10

The 10 most frequent ambiguous lemmas: one (NUM 504, NOUN 137, PRON 53, X 1), two (NUM 338, NOUN 1, X 1), 1 (NUM 153, X 7), three (NUM 144, X 1), 2 (NUM 130, X 5), 3 (NUM 80, X 5), five (NUM 76, NOUN 1), four (NUM 74, NOUN 1), 4 (NUM 66, X 4), 10 (NUM 61, X 1)

The 10 most frequent ambiguous types: one (NUM 432, NOUN 114, PRON 52, X 1), 1 (NUM 154, X 7), three (NUM 129, X 1), 2 (NUM 130, X 5), 3 (NUM 80, X 5), five (NUM 70, NOUN 1), 4 (NUM 66, X 4), 10 (NUM 61, X 1), 6 (NUM 55, X 1), 5 (NUM 54, X 2)

Morphology

The form / lemma ratio of NUM is 1.013569 (the average of all parts of speech is 1.248450).

The 1st highest number of forms (2) was observed with the lemma “1000”: 1,000, 1000.

The 2nd highest number of forms (2) was observed with the lemma “1400”: 1, 1,400.

The 3rd highest number of forms (2) was observed with the lemma “1970”: 19, 1970.

NUM occurs with 5 features: NumForm (4624; 100% instances), NumType (4624; 100% instances), Typo (8; 0% instances), Number (4; 0% instances), Foreign (1; 0% instances)

NUM occurs with 8 feature-value pairs: Foreign=Yes, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, Number=Sing, Typo=Yes

NUM occurs with 10 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (2850 tokens). Examples: 1, 2, 3, 4, 10, 20, 6, 5, 15, 7

Relations

NUM nodes are attached to their parents using 30 different relations: nummod (1604; 35% instances), obl (436; 9% instances), parataxis (400; 9% instances), nmod:unmarked (374; 8% instances), flat (358; 8% instances), nmod (306; 7% instances), conj (290; 6% instances), compound (246; 5% instances), root (160; 3% instances), obj (88; 2% instances), appos (80; 2% instances), nsubj (68; 1% instances), obl:unmarked (53; 1% instances), list (36; 1% instances), xcomp (27; 1% instances), dep (19; 0% instances), advcl (14; 0% instances), ccomp (13; 0% instances), orphan (12; 0% instances), nsubj:outer (10; 0% instances), nsubj:pass (9; 0% instances), dislocated (6; 0% instances), nmod:poss (4; 0% instances), acl:relcl (3; 0% instances), reparandum (3; 0% instances), advcl:relcl (2; 0% instances), amod (2; 0% instances), discourse (1; 0% instances), iobj (1; 0% instances), obl:agent (1; 0% instances)

Parents of NUM nodes belong to 14 different parts of speech: NOUN (1930; 42% instances), VERB (892; 19% instances), NUM (723; 16% instances), PROPN (661; 14% instances), (160; 3% instances), SYM (155; 3% instances), ADJ (69; 1% instances), ADV (17; 0% instances), X (6; 0% instances), AUX (5; 0% instances), INTJ (3; 0% instances), PRON (3; 0% instances), CCONJ (1; 0% instances), DET (1; 0% instances)

1838 (40%) NUM nodes are leaves.

1501 (32%) NUM nodes have one child.

753 (16%) NUM nodes have two children.

534 (12%) NUM nodes have three or more children.

The highest child degree of a NUM node is 13.

Children of NUM nodes are attached using 33 different relations: punct (1957; 39% instances), case (782; 16% instances), nmod (383; 8% instances), advmod (327; 7% instances), conj (309; 6% instances), nmod:unmarked (299; 6% instances), compound (265; 5% instances), nsubj (123; 2% instances), cc (122; 2% instances), cop (114; 2% instances), det (66; 1% instances), discourse (36; 1% instances), parataxis (35; 1% instances), acl:relcl (27; 1% instances), flat (24; 0% instances), mark (24; 0% instances), amod (21; 0% instances), appos (13; 0% instances), advcl (11; 0% instances), reparandum (11; 0% instances), acl (10; 0% instances), obl (6; 0% instances), aux (5; 0% instances), obl:unmarked (4; 0% instances), csubj (3; 0% instances), det:predet (3; 0% instances), cc:preconj (2; 0% instances), dep (2; 0% instances), goeswith (2; 0% instances), list (2; 0% instances), nummod (2; 0% instances), dislocated (1; 0% instances), nsubj:outer (1; 0% instances)

Children of NUM nodes belong to 17 different parts of speech: PUNCT (1957; 39% instances), NUM (723; 14% instances), ADP (695; 14% instances), ADV (317; 6% instances), NOUN (297; 6% instances), PROPN (262; 5% instances), CCONJ (121; 2% instances), AUX (120; 2% instances), SYM (104; 2% instances), PRON (100; 2% instances), ADJ (87; 2% instances), DET (70; 1% instances), VERB (63; 1% instances), INTJ (37; 1% instances), PART (19; 0% instances), SCONJ (17; 0% instances), X (3; 0% instances)