home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_English-GUM: POS Tags: NUM

There are 586 NUM lemmas (4%), 592 NUM types (3%) and 3175 NUM tokens (2%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 13 in number of tokens.

The 10 most frequent NUM lemmas: one, two, 1, 2, three, 3, four, 6, 4, 10

The 10 most frequent NUM types: one, two, 1, 2, three, 3, four, 6, 4, 10

The 10 most frequent ambiguous lemmas: one (NUM 295, NOUN 65, PRON 35), 1 (NUM 124, X 5), 2 (NUM 100, X 4), 3 (NUM 63, X 4), four (NUM 53, NOUN 1), 6 (NUM 49, PROPN 1, X 1), 4 (NUM 48, X 4, NOUN 1), 5 (NUM 44, X 2), five (NUM 40, NOUN 1), 7 (NUM 36, X 1)

The 10 most frequent ambiguous types: one (NUM 248, NOUN 52, PRON 34), 1 (NUM 124, X 5), 2 (NUM 100, X 4), 3 (NUM 63, X 4), 6 (NUM 49, PROPN 1, X 1), 4 (NUM 48, X 4, NOUN 1), 5 (NUM 44, X 2), five (NUM 35, NOUN 1), 7 (NUM 36, X 1), 8 (NUM 29, PROPN 2, X 1)

Morphology

The form / lemma ratio of NUM is 1.010239 (the average of all parts of speech is 1.226279).

The 1st highest number of forms (2) was observed with the lemma “1000”: 1,000, 1000.

The 2nd highest number of forms (2) was observed with the lemma “2000”: 2,000, 2000.

The 3rd highest number of forms (2) was observed with the lemma “20000”: 20,000, 20000.

NUM occurs with 4 features: NumForm (3175; 100% instances), NumType (3175; 100% instances), Number (6; 0% instances), Typo (2; 0% instances)

NUM occurs with 7 feature-value pairs: NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, Number=Sing, Typo=Yes

NUM occurs with 7 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (2115 tokens). Examples: 1, 2, 3, 6, 4, 10, 5, 15, 7, 12

Relations

NUM nodes are attached to their parents using 32 different relations: nummod (1069; 34% instances), dep (557; 18% instances), obl (343; 11% instances), nmod:tmod (322; 10% instances), nmod (219; 7% instances), conj (130; 4% instances), compound (122; 4% instances), root (93; 3% instances), appos (65; 2% instances), obj (49; 2% instances), nsubj (38; 1% instances), obl:tmod (30; 1% instances), parataxis (23; 1% instances), flat (21; 1% instances), list (16; 1% instances), xcomp (15; 0% instances), advcl (12; 0% instances), obl:npmod (12; 0% instances), nsubj:pass (7; 0% instances), dislocated (6; 0% instances), orphan (5; 0% instances), ccomp (4; 0% instances), amod (3; 0% instances), nmod:poss (3; 0% instances), acl:relcl (2; 0% instances), nmod:npmod (2; 0% instances), nsubj:outer (2; 0% instances), acl (1; 0% instances), advcl:relcl (1; 0% instances), discourse (1; 0% instances), obl:agent (1; 0% instances), reparandum (1; 0% instances)

Parents of NUM nodes belong to 13 different parts of speech: NOUN (1284; 40% instances), VERB (734; 23% instances), PROPN (495; 16% instances), NUM (437; 14% instances), (93; 3% instances), SYM (59; 2% instances), ADJ (48; 2% instances), ADV (12; 0% instances), X (6; 0% instances), PRON (3; 0% instances), INTJ (2; 0% instances), CCONJ (1; 0% instances), DET (1; 0% instances)

1177 (37%) NUM nodes are leaves.

988 (31%) NUM nodes have one child.

620 (20%) NUM nodes have two children.

390 (12%) NUM nodes have three or more children.

The highest child degree of a NUM node is 7.

Children of NUM nodes are attached using 31 different relations: punct (1617; 45% instances), case (584; 16% instances), nmod (259; 7% instances), advmod (230; 6% instances), compound (208; 6% instances), nmod:tmod (164; 5% instances), conj (133; 4% instances), nsubj (76; 2% instances), cc (71; 2% instances), cop (62; 2% instances), det (34; 1% instances), discourse (17; 0% instances), acl:relcl (16; 0% instances), mark (16; 0% instances), parataxis (16; 0% instances), flat (14; 0% instances), appos (12; 0% instances), dep (12; 0% instances), amod (11; 0% instances), nummod (11; 0% instances), acl (8; 0% instances), advcl (8; 0% instances), obl (7; 0% instances), obl:npmod (5; 0% instances), reparandum (4; 0% instances), aux (2; 0% instances), cc:preconj (1; 0% instances), csubj (1; 0% instances), det:predet (1; 0% instances), dislocated (1; 0% instances), nmod:npmod (1; 0% instances)

Children of NUM nodes belong to 17 different parts of speech: PUNCT (1617; 45% instances), ADP (496; 14% instances), NUM (437; 12% instances), ADV (231; 6% instances), PROPN (192; 5% instances), NOUN (185; 5% instances), SYM (95; 3% instances), CCONJ (70; 2% instances), AUX (64; 2% instances), ADJ (49; 1% instances), PRON (49; 1% instances), VERB (39; 1% instances), DET (36; 1% instances), INTJ (17; 0% instances), SCONJ (13; 0% instances), PART (10; 0% instances), X (2; 0% instances)