home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Chinese-PUD: POS Tags: NUM

There are 264 NUM lemmas (5%), 264 NUM types (5%) and 873 NUM tokens (4%). Out of 15 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 9 in number of tokens.

The 10 most frequent NUM lemmas: 一、 兩、 很多、 三、 許多、 六、 多、 20、 10、 十

The 10 most frequent NUM types: 一、 兩、 很多、 三、 許多、 六、 多、 20、 10、 十

The 10 most frequent ambiguous lemmas: 多 (NUM 14, ADJ 10, ADV 1), 天 (NOUN 10, NUM 1)

The 10 most frequent ambiguous types: 多 (NUM 14, ADJ 10, ADV 1), 天 (NOUN 10, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.000000 (the average of all parts of speech is 1.006233).

The 1st highest number of forms (1) was observed with the lemma “$15000”: $15000.

The 2nd highest number of forms (1) was observed with the lemma “$150萬”: $150萬.

The 3rd highest number of forms (1) was observed with the lemma “$25,000”: $25,000.

NUM occurs with 1 features: NumType (873; 100% instances)

NUM occurs with 1 feature-value pairs: NumType=Card

NUM occurs with 1 feature combinations. The most frequent feature combination is NumType=Card (873 tokens). Examples: 一、 兩、 很多、 三、 許多、 六、 多、 20、 10、 十

Relations

NUM nodes are attached to their parents using 11 different relations: nummod (809; 93% instances), obj (24; 3% instances), obl (8; 1% instances), conj (7; 1% instances), dep (6; 1% instances), nmod (6; 1% instances), root (5; 1% instances), nsubj (4; 0% instances), xcomp (2; 0% instances), ccomp (1; 0% instances), obl:tmod (1; 0% instances)

Parents of NUM nodes belong to 8 different parts of speech: NOUN (805; 92% instances), VERB (43; 5% instances), NUM (7; 1% instances), ADJ (6; 1% instances), (5; 1% instances), X (3; 0% instances), PART (2; 0% instances), PROPN (2; 0% instances)

820 (94%) NUM nodes are leaves.

36 (4%) NUM nodes have one child.

3 (0%) NUM nodes have two children.

14 (2%) NUM nodes have three or more children.

The highest child degree of a NUM node is 7.

Children of NUM nodes are attached using 16 different relations: punct (17; 17% instances), nmod (16; 16% instances), cop (11; 11% instances), nsubj (11; 11% instances), case (9; 9% instances), advmod (7; 7% instances), cc (7; 7% instances), conj (7; 7% instances), appos (3; 3% instances), det (3; 3% instances), case:loc (2; 2% instances), compound (2; 2% instances), dep (2; 2% instances), flat:name (2; 2% instances), mark (1; 1% instances), obl:tmod (1; 1% instances)

Children of NUM nodes belong to 13 different parts of speech: NOUN (26; 26% instances), PUNCT (17; 17% instances), AUX (11; 11% instances), ADV (7; 7% instances), CCONJ (7; 7% instances), NUM (7; 7% instances), ADP (6; 6% instances), PART (6; 6% instances), PROPN (4; 4% instances), DET (3; 3% instances), VERB (3; 3% instances), PRON (2; 2% instances), X (2; 2% instances)