home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_English-PUD: POS Tags: NUM

There are 214 NUM lemmas (4%), 216 NUM types (4%) and 464 NUM tokens (2%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: one, two, three, million, 10, four, 1, six, 3, I

The 10 most frequent NUM types: one, two, three, million, 10, four, 1, six, 3, I

The 10 most frequent ambiguous lemmas: one (NUM 39, NOUN 7, PRON 1), million (NUM 13, NOUN 1), I (PRON 53, NUM 6), billion (NUM 6, NOUN 2), five (NUM 4, ADJ 1), ten (NUM 4, NOUN 1), thousand (NOUN 1, NUM 1)

The 10 most frequent ambiguous types: one (NUM 36, NOUN 4), I (PRON 48, NUM 6), five (NUM 3, ADJ 1)

Morphology

The form / lemma ratio of NUM is 1.009346 (the average of all parts of speech is 1.151116).

The 1st highest number of forms (2) was observed with the lemma “3000”: 3,000, 3000.

The 2nd highest number of forms (2) was observed with the lemma “billion”: billion, bn.

The 3rd highest number of forms (1) was observed with the lemma “1”: 1.

NUM occurs with 3 features: NumForm (464; 100% instances), NumType (464; 100% instances), Abbr (4; 1% instances)

NUM occurs with 6 feature-value pairs: Abbr=Yes, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac

NUM occurs with 5 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (288 tokens). Examples: 10, 1, 3, 2014, 2015, 100, 1492, 20, 2010, 2012

Relations

NUM nodes are attached to their parents using 17 different relations: nummod (195; 42% instances), obl (89; 19% instances), nmod:unmarked (38; 8% instances), nmod (34; 7% instances), compound (32; 7% instances), flat (26; 6% instances), conj (13; 3% instances), nsubj (10; 2% instances), appos (8; 2% instances), obj (7; 2% instances), root (4; 1% instances), nsubj:pass (2; 0% instances), orphan (2; 0% instances), advcl (1; 0% instances), amod (1; 0% instances), parataxis (1; 0% instances), xcomp (1; 0% instances)

Parents of NUM nodes belong to 9 different parts of speech: NOUN (215; 46% instances), VERB (103; 22% instances), PROPN (49; 11% instances), NUM (46; 10% instances), SYM (38; 8% instances), ADJ (6; 1% instances), (4; 1% instances), ADV (2; 0% instances), PRON (1; 0% instances)

242 (52%) NUM nodes are leaves.

149 (32%) NUM nodes have one child.

40 (9%) NUM nodes have two children.

33 (7%) NUM nodes have three or more children.

The highest child degree of a NUM node is 8.

Children of NUM nodes are attached using 19 different relations: case (129; 36% instances), advmod (47; 13% instances), punct (40; 11% instances), nmod:unmarked (35; 10% instances), nmod (33; 9% instances), compound (19; 5% instances), cc (12; 3% instances), conj (8; 2% instances), cop (8; 2% instances), nsubj (7; 2% instances), det (6; 2% instances), amod (3; 1% instances), nummod (3; 1% instances), advcl (2; 1% instances), acl (1; 0% instances), acl:relcl (1; 0% instances), expl (1; 0% instances), orphan (1; 0% instances), parataxis (1; 0% instances)

Children of NUM nodes belong to 13 different parts of speech: ADP (124; 35% instances), NUM (46; 13% instances), ADV (42; 12% instances), PUNCT (40; 11% instances), NOUN (37; 10% instances), PROPN (18; 5% instances), ADJ (13; 4% instances), CCONJ (12; 3% instances), AUX (8; 2% instances), DET (7; 2% instances), VERB (5; 1% instances), SYM (4; 1% instances), PRON (1; 0% instances)