Treebank Statistics: UD_English-ParTUT: POS Tags: NUM
There are 248 NUM lemmas (4%), 248 NUM types (3%) and 832 NUM tokens (2%).
Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 13 in number of tokens.
The 10 most frequent NUM lemmas: two, one, 1, three, 2, four, 3, 6, five, 18
The 10 most frequent NUM types: two, one, 1, three, 2, four, 3, 6, five, 18
The 10 most frequent ambiguous lemmas: two (NUM 60, NOUN 2), one (NUM 57, PRON 33, DET 2), three (NUM 27, NOUN 1), million (NUM 9, NOUN 2), ten (NUM 3, NOUN 2), I (PRON 191, NUM 2), VIII (NOUN 1, NUM 1)
The 10 most frequent ambiguous types: two (NUM 50, NOUN 1), one (NUM 50, PRON 29, DET 2), three (NUM 27, NOUN 1), ten (NUM 3, NOUN 1), I (PRON 191, NUM 2), VIII (NOUN 1, NUM 1)
- two
- one
- three
- ten
- I
- VIII
Morphology
The form / lemma ratio of NUM is 1.000000 (the average of all parts of speech is 1.205397).
The 1st highest number of forms (1) was observed with the lemma “-20”: -20.
The 2nd highest number of forms (1) was observed with the lemma “-40”: -40.
The 3rd highest number of forms (1) was observed with the lemma “1”: 1.
NUM occurs with 2 features: NumType (832; 100% instances), NumForm (15; 2% instances)
NUM occurs with 2 feature-value pairs: NumForm=Roman, NumType=Card
NUM occurs with 2 feature combinations.
The most frequent feature combination is NumType=Card (817 tokens).
Examples: two, one, 1, three, 2, four, 3, 6, five, 18
Relations
NUM nodes are attached to their parents using 18 different relations: nummod (299; 36% instances), obl (139; 17% instances), nmod:unmarked (102; 12% instances), nmod (66; 8% instances), discourse (64; 8% instances), flat (58; 7% instances), conj (33; 4% instances), compound (20; 2% instances), root (17; 2% instances), obj (10; 1% instances), nsubj (7; 1% instances), appos (6; 1% instances), parataxis (5; 1% instances), ccomp (2; 0% instances), advcl (1; 0% instances), amod (1; 0% instances), obl:unmarked (1; 0% instances), orphan (1; 0% instances)
Parents of NUM nodes belong to 9 different parts of speech: NOUN (365; 44% instances), VERB (212; 25% instances), NUM (128; 15% instances), PROPN (51; 6% instances), SYM (42; 5% instances), (17; 2% instances), ADJ (13; 2% instances), X (3; 0% instances), ADV (1; 0% instances)
404 (49%) NUM nodes are leaves.
182 (22%) NUM nodes have one child.
151 (18%) NUM nodes have two children.
95 (11%) NUM nodes have three or more children.
The highest child degree of a NUM node is 7.
Children of NUM nodes are attached using 21 different relations: punct (274; 34% instances), case (210; 26% instances), nmod:unmarked (135; 17% instances), nmod (54; 7% instances), conj (37; 5% instances), cc (23; 3% instances), advmod (17; 2% instances), compound (16; 2% instances), cop (12; 1% instances), nsubj (11; 1% instances), amod (5; 1% instances), appos (5; 1% instances), advcl (3; 0% instances), mark (3; 0% instances), nummod (3; 0% instances), aux (2; 0% instances), csubj (2; 0% instances), det (2; 0% instances), ccomp (1; 0% instances), discourse (1; 0% instances), obj (1; 0% instances)
Children of NUM nodes belong to 15 different parts of speech: PUNCT (274; 34% instances), ADP (199; 24% instances), NUM (128; 16% instances), PROPN (77; 9% instances), NOUN (52; 6% instances), CCONJ (23; 3% instances), AUX (14; 2% instances), ADJ (13; 2% instances), SYM (11; 1% instances), ADV (10; 1% instances), VERB (6; 1% instances), SCONJ (4; 0% instances), DET (3; 0% instances), PRON (2; 0% instances), PART (1; 0% instances)