home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Chinese-Beginner: POS Tags: NUM

There are 32 NUM lemmas (2%), 32 NUM types (2%) and 640 NUM tokens (3%). Out of 15 observed tags, the rank of NUM is: 7 in number of lemmas, 7 in number of types and 9 in number of tokens.

The 10 most frequent NUM lemmas: 一、 两、 十、 三、 几、 五、 1、 二、 八、 四

The 10 most frequent NUM types: 一、 两、 十、 三、 几、 五、 1、 二、 八、 四

The 10 most frequent ambiguous lemmas: 一 (NUM 170, ADV 2, DET 2), 几 (NUM 41, DET 1, PRON 1), 二 (NUM 23, NOUN 1), 半 (NOUN 28, NUM 11), 第 (NUM 9, NOUN 3), 零 (NUM 5, ADV 1), 多 (ADJ 86, ADV 40, NUM 4), 双 (NOUN 2, NUM 1)

The 10 most frequent ambiguous types: 一 (NUM 170, ADV 2, DET 2), 几 (NUM 41, DET 1, PRON 1), 二 (NUM 23, NOUN 1), 半 (NOUN 28, NUM 11), 第 (NUM 9, NOUN 3), 零 (NUM 5, ADV 1), 多 (ADJ 86, ADV 40, NUM 4), 双 (NOUN 2, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.000000 (the average of all parts of speech is 1.000000).

The 1st highest number of forms (1) was observed with the lemma “0”: 0.

The 2nd highest number of forms (1) was observed with the lemma “1”: 1.

The 3rd highest number of forms (1) was observed with the lemma “2”: 2.

NUM occurs with 1 features: NumType (599; 94% instances)

NUM occurs with 2 feature-value pairs: NumType=Card, NumType=Ord

NUM occurs with 3 feature combinations. The most frequent feature combination is NumType=Card (592 tokens). Examples: 一、 两、 十、 三、 几、 五、 1、 二、 八、 四

Relations

NUM nodes are attached to their parents using 8 different relations: nummod (474; 74% instances), flat (106; 17% instances), dep (23; 4% instances), nmod (12; 2% instances), conj (10; 2% instances), obj (9; 1% instances), obl (3; 0% instances), root (3; 0% instances)

Parents of NUM nodes belong to 7 different parts of speech: NOUN (474; 74% instances), NUM (139; 22% instances), VERB (15; 2% instances), ADJ (6; 1% instances), (3; 0% instances), PROPN (2; 0% instances), PART (1; 0% instances)

322 (50%) NUM nodes are leaves.

243 (38%) NUM nodes have one child.

54 (8%) NUM nodes have two children.

21 (3%) NUM nodes have three or more children.

The highest child degree of a NUM node is 7.

Children of NUM nodes are attached using 17 different relations: clf (222; 52% instances), flat (106; 25% instances), nmod (18; 4% instances), nummod (16; 4% instances), advmod (15; 4% instances), det (12; 3% instances), conj (10; 2% instances), punct (6; 1% instances), advcl (4; 1% instances), nsubj (4; 1% instances), cc (3; 1% instances), cop (3; 1% instances), obl (3; 1% instances), amod (2; 0% instances), case (1; 0% instances), discourse (1; 0% instances), parataxis (1; 0% instances)

Children of NUM nodes belong to 11 different parts of speech: NOUN (235; 55% instances), NUM (139; 33% instances), ADV (15; 4% instances), DET (12; 3% instances), ADJ (7; 2% instances), PUNCT (6; 1% instances), PRON (4; 1% instances), AUX (3; 1% instances), CCONJ (3; 1% instances), PART (2; 0% instances), VERB (1; 0% instances)