home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_English-GENTLE: POS Tags: NUM

There are 139 NUM lemmas (4%), 139 NUM types (3%) and 386 NUM tokens (2%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: one, 1, 5, 2, two, 4, 3, X, 10, three

The 10 most frequent NUM types: one, 1, 5, 2, two, 4, 3, X, 10, three

The 10 most frequent ambiguous lemmas: one (NUM 29, NOUN 6, PRON 3), X (NOUN 14, NUM 10), million (NUM 2, NOUN 1), nil (NUM 2, NOUN 1), thousand (NUM 2, NOUN 1)

The 10 most frequent ambiguous types: one (NUM 25, NOUN 6, PRON 3, X 1), X (NOUN 14, NUM 10), nil (NUM 2, NOUN 1)

Morphology

The form / lemma ratio of NUM is 1.000000 (the average of all parts of speech is 1.147634).

The 1st highest number of forms (1) was observed with the lemma “0”: 0.

The 2nd highest number of forms (1) was observed with the lemma “07”: 07.

The 3rd highest number of forms (1) was observed with the lemma “08”: 08.

NUM occurs with 2 features: NumForm (386; 100% instances), NumType (386; 100% instances)

NUM occurs with 5 feature-value pairs: NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac

NUM occurs with 4 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (279 tokens). Examples: 1, 5, 2, 4, 3, 10, 15, 20, 30, 6

Relations

NUM nodes are attached to their parents using 16 different relations: nummod (147; 38% instances), dep (128; 33% instances), nmod (31; 8% instances), conj (17; 4% instances), compound (12; 3% instances), obl:unmarked (11; 3% instances), root (9; 2% instances), obj (6; 2% instances), obl (6; 2% instances), nmod:unmarked (5; 1% instances), nsubj (5; 1% instances), appos (4; 1% instances), parataxis (2; 1% instances), amod (1; 0% instances), flat (1; 0% instances), xcomp (1; 0% instances)

Parents of NUM nodes belong to 9 different parts of speech: NOUN (218; 56% instances), PROPN (69; 18% instances), NUM (44; 11% instances), SYM (23; 6% instances), VERB (16; 4% instances), (9; 2% instances), ADJ (5; 1% instances), DET (1; 0% instances), PRON (1; 0% instances)

244 (63%) NUM nodes are leaves.

101 (26%) NUM nodes have one child.

25 (6%) NUM nodes have two children.

16 (4%) NUM nodes have three or more children.

The highest child degree of a NUM node is 6.

Children of NUM nodes are attached using 23 different relations: punct (40; 18% instances), case (38; 17% instances), nmod (36; 17% instances), conj (31; 14% instances), nsubj (13; 6% instances), compound (10; 5% instances), advmod (9; 4% instances), nmod:unmarked (7; 3% instances), dep (6; 3% instances), cc (5; 2% instances), cop (5; 2% instances), obl (3; 1% instances), acl (2; 1% instances), acl:relcl (2; 1% instances), amod (2; 1% instances), appos (2; 1% instances), det (1; 0% instances), discourse (1; 0% instances), flat (1; 0% instances), mark (1; 0% instances), nmod:poss (1; 0% instances), obl:unmarked (1; 0% instances), parataxis (1; 0% instances)

Children of NUM nodes belong to 15 different parts of speech: NUM (44; 20% instances), PUNCT (40; 18% instances), NOUN (36; 17% instances), ADP (25; 11% instances), SYM (22; 10% instances), ADV (14; 6% instances), PROPN (12; 6% instances), ADJ (7; 3% instances), AUX (5; 2% instances), VERB (5; 2% instances), CCONJ (2; 1% instances), DET (2; 1% instances), PART (2; 1% instances), PRON (1; 0% instances), X (1; 0% instances)