home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_English-PUD: POS Tags: NUM

There are 214 NUM lemmas (4%), 216 NUM types (4%) and 464 NUM tokens (2%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: one, two, three, million, 10, four, 1, six, 3, I

The 10 most frequent NUM types: one, two, three, million, 10, four, 1, six, 3, I

The 10 most frequent ambiguous lemmas: one (NUM 39, NOUN 7, PRON 1), million (NUM 13, NOUN 1), I (PRON 53, NUM 6), billion (NUM 6, NOUN 2), five (NUM 4, ADJ 1), ten (NUM 4, NOUN 1), thousand (NOUN 1, NUM 1)

The 10 most frequent ambiguous types: one (NUM 36, NOUN 4), I (PRON 48, NUM 6), five (NUM 3, ADJ 1)

Morphology

The form / lemma ratio of NUM is 1.009346 (the average of all parts of speech is 1.149901).

The 1st highest number of forms (2) was observed with the lemma “3000”: 3,000, 3000.

The 2nd highest number of forms (2) was observed with the lemma “billion”: billion, bn.

The 3rd highest number of forms (1) was observed with the lemma “1”: 1.

NUM occurs with 3 features: NumForm (464; 100% instances), NumType (464; 100% instances), Abbr (4; 1% instances)

NUM occurs with 6 feature-value pairs: Abbr=Yes, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac

NUM occurs with 5 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (288 tokens). Examples: 10, 1, 3, 2014, 2015, 100, 1492, 20, 2010, 2012

Relations

NUM nodes are attached to their parents using 16 different relations: nummod (254; 55% instances), obl (79; 17% instances), compound (31; 7% instances), nmod (31; 7% instances), flat (14; 3% instances), conj (12; 3% instances), nmod:unmarked (10; 2% instances), nsubj (10; 2% instances), obj (7; 2% instances), appos (6; 1% instances), root (3; 1% instances), nsubj:pass (2; 0% instances), orphan (2; 0% instances), advcl (1; 0% instances), amod (1; 0% instances), xcomp (1; 0% instances)

Parents of NUM nodes belong to 9 different parts of speech: NOUN (209; 45% instances), VERB (93; 20% instances), PROPN (76; 16% instances), SYM (38; 8% instances), NUM (35; 8% instances), ADJ (5; 1% instances), ADV (4; 1% instances), (3; 1% instances), PRON (1; 0% instances)

259 (56%) NUM nodes are leaves.

148 (32%) NUM nodes have one child.

36 (8%) NUM nodes have two children.

21 (5%) NUM nodes have three or more children.

The highest child degree of a NUM node is 8.

Children of NUM nodes are attached using 19 different relations: case (116; 38% instances), advmod (46; 15% instances), nmod (31; 10% instances), punct (31; 10% instances), compound (18; 6% instances), cc (11; 4% instances), conj (8; 3% instances), cop (7; 2% instances), nsubj (7; 2% instances), det (6; 2% instances), nmod:unmarked (6; 2% instances), nummod (6; 2% instances), amod (3; 1% instances), acl (1; 0% instances), acl:relcl (1; 0% instances), advcl (1; 0% instances), flat (1; 0% instances), orphan (1; 0% instances), parataxis (1; 0% instances)

Children of NUM nodes belong to 12 different parts of speech: ADP (113; 37% instances), ADV (41; 14% instances), NUM (35; 12% instances), NOUN (34; 11% instances), PUNCT (31; 10% instances), ADJ (13; 4% instances), CCONJ (11; 4% instances), AUX (7; 2% instances), DET (7; 2% instances), PROPN (4; 1% instances), VERB (4; 1% instances), SYM (2; 1% instances)