home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Chinese: POS Tags: NUM

There are 1257 NUM lemmas (6%), 1257 NUM types (6%) and 6659 NUM tokens (5%). Out of 15 observed tags, the rank of NUM is: 4 in number of lemmas, 4 in number of types and 6 in number of tokens.

The 10 most frequent NUM lemmas: 一、 兩、 三、 1、 第一、 3、 12、 5、 2、 8

The 10 most frequent NUM types: 一、 兩、 三、 1、 第一、 3、 12、 5、 2、 8

The 10 most frequent ambiguous lemmas: 一 (NUM 1123, NOUN 1), 第一 (NUM 117, PROPN 2), 四 (NUM 84, X 1), 多 (NUM 83, ADV 28, ADJ 16, PART 3), 雙 (NUM 35, NOUN 1), 很多 (NUM 33, ADJ 4), 單 (NUM 26, PART 2), 半 (NUM 24, PART 6), 數 (NUM 22, PART 15), 九 (NUM 16, PROPN 2)

The 10 most frequent ambiguous types: 一 (NUM 1123, NOUN 1), 第一 (NUM 117, PROPN 2), 四 (NUM 84, X 1), 多 (NUM 83, ADV 28, ADJ 16, PART 3), 雙 (NUM 35, NOUN 1), 很多 (NUM 33, ADJ 4), 單 (NUM 26, PART 2), 半 (NUM 24, PART 6), 數 (NUM 22, PART 15), 九 (NUM 16, PROPN 2)

Morphology

The form / lemma ratio of NUM is 1.000000 (the average of all parts of speech is 1.000266).

The 1st highest number of forms (1) was observed with the lemma “,”: ,.

The 2nd highest number of forms (1) was observed with the lemma “-15”: -15.

The 3rd highest number of forms (1) was observed with the lemma “-154”: -154.

NUM occurs with 1 features: NumType (6659; 100% instances)

NUM occurs with 1 feature-value pairs: NumType=Card

NUM occurs with 1 feature combinations. The most frequent feature combination is NumType=Card (6659 tokens). Examples: 一、 兩、 三、 1、 第一、 3、 12、 5、 2、 8

Relations

NUM nodes are attached to their parents using 19 different relations: nummod (6198; 93% instances), root (77; 1% instances), obj (62; 1% instances), conj (53; 1% instances), nmod (51; 1% instances), advmod (50; 1% instances), det (40; 1% instances), nsubj (32; 0% instances), dep (26; 0% instances), acl (15; 0% instances), case:suff (10; 0% instances), nmod:tmod (10; 0% instances), appos (8; 0% instances), obl (8; 0% instances), amod (6; 0% instances), ccomp (5; 0% instances), xcomp (4; 0% instances), punct (3; 0% instances), nsubj:pass (1; 0% instances)

Parents of NUM nodes belong to 9 different parts of speech: NOUN (6213; 93% instances), VERB (152; 2% instances), PART (93; 1% instances), (77; 1% instances), NUM (70; 1% instances), X (23; 0% instances), ADJ (15; 0% instances), PROPN (15; 0% instances), SYM (1; 0% instances)

6304 (95%) NUM nodes are leaves.

191 (3%) NUM nodes have one child.

56 (1%) NUM nodes have two children.

108 (2%) NUM nodes have three or more children.

The highest child degree of a NUM node is 16.

Children of NUM nodes are attached using 24 different relations: punct (231; 26% instances), det (113; 13% instances), nsubj (97; 11% instances), cop (93; 10% instances), dep (68; 8% instances), conj (51; 6% instances), cc (44; 5% instances), case:dec (40; 4% instances), nmod (40; 4% instances), advmod (39; 4% instances), acl (25; 3% instances), nummod (11; 1% instances), appos (10; 1% instances), case (10; 1% instances), nmod:tmod (7; 1% instances), csubj (4; 0% instances), flat:foreign (4; 0% instances), mark (2; 0% instances), acl:relcl (1; 0% instances), amod (1; 0% instances), case:pref (1; 0% instances), ccomp (1; 0% instances), obj (1; 0% instances), xcomp (1; 0% instances)

Children of NUM nodes belong to 15 different parts of speech: PUNCT (227; 25% instances), NOUN (226; 25% instances), AUX (93; 10% instances), PART (77; 9% instances), NUM (70; 8% instances), CCONJ (44; 5% instances), VERB (43; 5% instances), ADV (36; 4% instances), ADP (16; 2% instances), DET (15; 2% instances), PROPN (14; 2% instances), PRON (12; 1% instances), X (12; 1% instances), ADJ (6; 1% instances), SYM (4; 0% instances)