Treebank Statistics: UD_English-GUM: POS Tags: NUM
There are 652 NUM
lemmas (4%), 658 NUM
types (3%) and 3685 NUM
tokens (2%).
Out of 17 observed tags, the rank of NUM
is: 5 in number of lemmas, 6 in number of types and 13 in number of tokens.
The 10 most frequent NUM
lemmas: one, two, 1, 2, three, 3, four, 10, 4, 6
The 10 most frequent NUM
types: one, two, 1, 2, three, 3, four, 10, 4, 6
The 10 most frequent ambiguous lemmas: one (NUM 348, NOUN 103, PRON 41), two (NUM 267, NOUN 1), 1 (NUM 135, X 6), 2 (NUM 113, X 4), 3 (NUM 72, X 4), four (NUM 60, NOUN 1), 4 (NUM 53, X 4), 6 (NUM 53, X 1), five (NUM 50, NOUN 1), 5 (NUM 47, X 2)
The 10 most frequent ambiguous types: one (NUM 295, NOUN 88, PRON 40), 1 (NUM 135, X 6), 2 (NUM 113, X 4), 3 (NUM 72, X 4), 4 (NUM 53, X 4), 6 (NUM 53, X 1), five (NUM 44, NOUN 1), 5 (NUM 47, X 2), 7 (NUM 40, X 1), 8 (NUM 33, X 2)
- one
- 1
- 2
- 3
- 4
- 6
- five
- 5
- 7
- 8
Morphology
The form / lemma ratio of NUM
is 1.009202 (the average of all parts of speech is 1.229167).
The 1st highest number of forms (2) was observed with the lemma “1000”: 1,000, 1000.
The 2nd highest number of forms (2) was observed with the lemma “2000”: 2,000, 2000.
The 3rd highest number of forms (2) was observed with the lemma “20000”: 20,000, 20000.
NUM
occurs with 4 features: NumForm (3685; 100% instances), NumType (3685; 100% instances), Number (6; 0% instances), Typo (3; 0% instances)
NUM
occurs with 7 feature-value pairs: NumForm=Digit
, NumForm=Roman
, NumForm=Word
, NumType=Card
, NumType=Frac
, Number=Sing
, Typo=Yes
NUM
occurs with 9 feature combinations.
The most frequent feature combination is NumForm=Digit|NumType=Card
(2382 tokens).
Examples: 1, 2, 3, 10, 4, 6, 5, 15, 7, 20
Relations
NUM
nodes are attached to their parents using 32 different relations: nummod (1224; 33% instances), dep (604; 16% instances), obl (372; 10% instances), nmod:tmod (337; 9% instances), nmod (262; 7% instances), conj (208; 6% instances), compound (164; 4% instances), root (125; 3% instances), appos (68; 2% instances), obj (67; 2% instances), nsubj (42; 1% instances), parataxis (33; 1% instances), obl:tmod (30; 1% instances), flat (25; 1% instances), xcomp (21; 1% instances), obl:npmod (17; 0% instances), list (16; 0% instances), advcl (12; 0% instances), orphan (10; 0% instances), ccomp (9; 0% instances), dislocated (7; 0% instances), nsubj:pass (7; 0% instances), nmod:npmod (6; 0% instances), amod (4; 0% instances), nmod:poss (3; 0% instances), nsubj:outer (3; 0% instances), reparandum (3; 0% instances), acl:relcl (2; 0% instances), acl (1; 0% instances), advcl:relcl (1; 0% instances), discourse (1; 0% instances), obl:agent (1; 0% instances)
Parents of NUM
nodes belong to 14 different parts of speech: NOUN (1475; 40% instances), VERB (801; 22% instances), NUM (567; 15% instances), PROPN (548; 15% instances), (125; 3% instances), SYM (85; 2% instances), ADJ (53; 1% instances), ADV (14; 0% instances), X (6; 0% instances), INTJ (4; 0% instances), PRON (3; 0% instances), AUX (2; 0% instances), CCONJ (1; 0% instances), DET (1; 0% instances)
1366 (37%) NUM
nodes are leaves.
1195 (32%) NUM
nodes have one child.
687 (19%) NUM
nodes have two children.
437 (12%) NUM
nodes have three or more children.
The highest child degree of a NUM
node is 13.
Children of NUM
nodes are attached using 32 different relations: punct (1803; 43% instances), case (658; 16% instances), nmod (291; 7% instances), advmod (265; 6% instances), compound (246; 6% instances), conj (223; 5% instances), nmod:tmod (178; 4% instances), nsubj (96; 2% instances), cc (93; 2% instances), cop (84; 2% instances), det (44; 1% instances), discourse (26; 1% instances), mark (19; 0% instances), parataxis (19; 0% instances), acl:relcl (18; 0% instances), flat (18; 0% instances), dep (14; 0% instances), nummod (14; 0% instances), appos (12; 0% instances), amod (11; 0% instances), advcl (10; 0% instances), acl (8; 0% instances), reparandum (7; 0% instances), obl (6; 0% instances), obl:npmod (5; 0% instances), aux (4; 0% instances), cc:preconj (1; 0% instances), csubj (1; 0% instances), det:predet (1; 0% instances), dislocated (1; 0% instances), nmod:npmod (1; 0% instances), obl:tmod (1; 0% instances)
Children of NUM
nodes belong to 17 different parts of speech: PUNCT (1803; 43% instances), ADP (577; 14% instances), NUM (567; 14% instances), ADV (253; 6% instances), NOUN (221; 5% instances), PROPN (206; 5% instances), SYM (100; 2% instances), CCONJ (91; 2% instances), AUX (88; 2% instances), PRON (69; 2% instances), ADJ (57; 1% instances), DET (46; 1% instances), VERB (43; 1% instances), INTJ (28; 1% instances), SCONJ (14; 0% instances), PART (13; 0% instances), X (2; 0% instances)