home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Chinese-HK: POS Tags: PUNCT

There are 16 PUNCT lemmas (3%), 15 PUNCT types (3%) and 352 PUNCT tokens (19%). Out of 17 observed tags, the rank of PUNCT is: 10 in number of lemmas, 11 in number of types and 2 in number of tokens.

The 10 most frequent PUNCT lemmas: _、 ,、 。、 !、 、、 ?、 ⋯⋯、 《、 》、 「

The 10 most frequent PUNCT types: 。、 ,、 !、 ?、 、、 ⋯⋯、 《、 》、 「、 」

The 10 most frequent ambiguous lemmas: _ (VERB 114, PUNCT 111, NOUN 69, ADV 63, PART 54, PRON 49, ADJ 21, NUM 19, AUX 18, ADP 10, PROPN 10, DET 8, INTJ 5, SCONJ 1, X 1)

The 10 most frequent ambiguous types:

Morphology

The form / lemma ratio of PUNCT is 0.937500 (the average of all parts of speech is 1.221258).

The 1st highest number of forms (4) was observed with the lemma “_”: 。, !, ,, ?.

The 2nd highest number of forms (1) was observed with the lemma “(”: (.

The 3rd highest number of forms (1) was observed with the lemma “)”: ).

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 1 different relations: punct (352; 100% instances)

Parents of PUNCT nodes belong to 10 different parts of speech: VERB (227; 64% instances), NOUN (57; 16% instances), ADJ (31; 9% instances), PROPN (15; 4% instances), INTJ (7; 2% instances), PRON (6; 2% instances), NUM (3; 1% instances), SYM (3; 1% instances), AUX (2; 1% instances), ADV (1; 0% instances)

352 (100%) PUNCT nodes are leaves.

The highest child degree of a PUNCT node is 0.