home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_English-GUM: POS Tags: NUM

There are 684 NUM lemmas (4%), 692 NUM types (3%) and 3994 NUM tokens (2%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 13 in number of tokens.

The 10 most frequent NUM lemmas: one, two, 1, three, 2, 3, four, 4, five, 10

The 10 most frequent NUM types: one, two, 1, three, 2, 3, four, 4, five, 10

The 10 most frequent ambiguous lemmas: one (NUM 397, NOUN 115, PRON 46), two (NUM 291, NOUN 1), 1 (NUM 148, X 6), 2 (NUM 115, X 4), 3 (NUM 76, X 4), four (NUM 68, NOUN 1), 4 (NUM 61, X 4), five (NUM 60, NOUN 1), 10 (NUM 56, X 1), 6 (NUM 53, X 1)

The 10 most frequent ambiguous types: one (NUM 337, NOUN 98, PRON 45), 1 (NUM 149, X 6), 2 (NUM 115, X 4), 3 (NUM 76, X 4), 4 (NUM 61, X 4), five (NUM 54, NOUN 1), 10 (NUM 56, X 1), 6 (NUM 53, X 1), 5 (NUM 51, X 2), 7 (NUM 42, X 1)

Morphology

The form / lemma ratio of NUM is 1.011696 (the average of all parts of speech is 1.237215).

The 1st highest number of forms (2) was observed with the lemma “1000”: 1,000, 1000.

The 2nd highest number of forms (2) was observed with the lemma “1400”: 1, 1,400.

The 3rd highest number of forms (2) was observed with the lemma “2000”: 2,000, 2000.

NUM occurs with 4 features: NumForm (3993; 100% instances), NumType (3993; 100% instances), Number (7; 0% instances), Typo (5; 0% instances)

NUM occurs with 7 feature-value pairs: NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, Number=Sing, Typo=Yes

NUM occurs with 10 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (2517 tokens). Examples: 1, 2, 3, 4, 10, 6, 5, 20, 15, 7

Relations

NUM nodes are attached to their parents using 30 different relations: nummod (1371; 34% instances), dep (655; 16% instances), obl (379; 9% instances), nmod:unmarked (351; 9% instances), nmod (277; 7% instances), conj (228; 6% instances), compound (182; 5% instances), root (135; 3% instances), obj (75; 2% instances), appos (72; 2% instances), obl:unmarked (50; 1% instances), nsubj (48; 1% instances), parataxis (36; 1% instances), flat (28; 1% instances), xcomp (22; 1% instances), list (16; 0% instances), advcl (12; 0% instances), orphan (10; 0% instances), ccomp (9; 0% instances), dislocated (8; 0% instances), nsubj:pass (7; 0% instances), nsubj:outer (5; 0% instances), amod (4; 0% instances), nmod:poss (4; 0% instances), acl:relcl (3; 0% instances), reparandum (3; 0% instances), acl (1; 0% instances), advcl:relcl (1; 0% instances), discourse (1; 0% instances), obl:agent (1; 0% instances)

Parents of NUM nodes belong to 15 different parts of speech: NOUN (1632; 41% instances), VERB (821; 21% instances), NUM (608; 15% instances), PROPN (594; 15% instances), (135; 3% instances), SYM (113; 3% instances), ADJ (57; 1% instances), ADV (14; 0% instances), X (6; 0% instances), AUX (4; 0% instances), INTJ (4; 0% instances), PRON (3; 0% instances), ADP (1; 0% instances), CCONJ (1; 0% instances), DET (1; 0% instances)

1542 (39%) NUM nodes are leaves.

1284 (32%) NUM nodes have one child.

718 (18%) NUM nodes have two children.

450 (11%) NUM nodes have three or more children.

The highest child degree of a NUM node is 13.

Children of NUM nodes are attached using 31 different relations: punct (1849; 42% instances), case (685; 16% instances), nmod (314; 7% instances), advmod (284; 6% instances), compound (259; 6% instances), conj (243; 6% instances), nmod:unmarked (186; 4% instances), cc (104; 2% instances), nsubj (103; 2% instances), cop (91; 2% instances), det (52; 1% instances), discourse (28; 1% instances), flat (22; 1% instances), parataxis (21; 0% instances), acl:relcl (20; 0% instances), mark (18; 0% instances), nummod (18; 0% instances), amod (15; 0% instances), appos (13; 0% instances), dep (13; 0% instances), advcl (11; 0% instances), reparandum (10; 0% instances), acl (9; 0% instances), obl (6; 0% instances), obl:unmarked (5; 0% instances), aux (4; 0% instances), det:predet (2; 0% instances), cc:preconj (1; 0% instances), csubj (1; 0% instances), dislocated (1; 0% instances), goeswith (1; 0% instances)

Children of NUM nodes belong to 17 different parts of speech: PUNCT (1849; 42% instances), NUM (608; 14% instances), ADP (604; 14% instances), ADV (269; 6% instances), NOUN (236; 5% instances), PROPN (215; 5% instances), SYM (103; 2% instances), CCONJ (102; 2% instances), AUX (96; 2% instances), PRON (77; 2% instances), ADJ (65; 1% instances), DET (55; 1% instances), VERB (50; 1% instances), INTJ (30; 1% instances), PART (14; 0% instances), SCONJ (13; 0% instances), X (3; 0% instances)