home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Cantonese-HK: POS Tags: DET

There are 21 DET lemmas (2%), 42 DET types (2%) and 330 DET tokens (2%). Out of 15 observed tags, the rank of DET is: 11 in number of lemmas, 8 in number of types and 9 in number of tokens.

The 10 most frequent DET lemmas: _、 呢、 嗰、 咩、 每、 幾多、 成、 乜、 呢啲、 嗰啲

The 10 most frequent DET types: 呢、 嗰、 依個、 任何、 其他、 呢個、 依、 咩、 嗰個、 每

The 10 most frequent ambiguous lemmas: _ (PUNCT 1377, VERB 1352, NOUN 1283, ADV 853, PART 764, PRON 662, AUX 335, DET 217, ADJ 209, ADP 140, NUM 124, SCONJ 101, CCONJ 93, INTJ 92, PROPN 52), 呢 (PART 71, DET 42), 咩 (DET 15, PRON 13, PART 10), 幾多 (DET 4, PRON 1), 乜 (DET 2, PRON 2), 呢啲 (PRON 6, DET 2), 嗰啲 (PRON 5, DET 2), 下 (ADV 19, DET 1, INTJ 1, PART 1), 係 (VERB 65, AUX 44, DET 1), 啲 (NOUN 56, ADV 26, DET 1, PART 1)

The 10 most frequent ambiguous types: 呢 (PART 328, DET 75, VERB 3, NOUN 1), 依個 (DET 25, PRON 9), 呢個 (PRON 20, DET 16, PART 2), 咩 (PRON 17, DET 15, PART 10), 嗰個 (DET 14, PRON 5), 呢啲 (DET 7, PRON 7), 下 (ADV 27, DET 6, INTJ 2, PART 2), 幾多 (DET 5, PRON 1), 依啲 (DET 3, PRON 1), 嗰啲 (PRON 7, DET 3)

Morphology

The form / lemma ratio of DET is 2.000000 (the average of all parts of speech is 1.624294).

The 1st highest number of forms (36) was observed with the lemma “_”: 一個, 一切, 一啲, 上, 下, 今, 任何, 依, 依個, 依啲, 全, 其中, 其他, 初, 另, 各位, 呢, 呢個, 呢啲, 啲, 嗰, 嗰份, 嗰個, 嗰啲, 多, 好多, 幾, 幾多, 成, 所有, 整個, 本, 某啲, 每, 邊, 首.

The 2nd highest number of forms (1) was observed with the lemma “一部分”: 一部分.

The 3rd highest number of forms (1) was observed with the lemma “下”: 下.

DET does not occur with any features.

Relations

DET nodes are attached to their parents using 6 different relations: det (321; 97% instances), reparandum (4; 1% instances), advcl (2; 1% instances), cop (1; 0% instances), nmod (1; 0% instances), obj (1; 0% instances)

Parents of DET nodes belong to 7 different parts of speech: NOUN (308; 93% instances), VERB (11; 3% instances), PROPN (6; 2% instances), ADJ (2; 1% instances), DET (1; 0% instances), NUM (1; 0% instances), PART (1; 0% instances)

209 (63%) DET nodes are leaves.

113 (34%) DET nodes have one child.

5 (2%) DET nodes have two children.

3 (1%) DET nodes have three or more children.

The highest child degree of a DET node is 5.

Children of DET nodes are attached using 10 different relations: clf (83; 62% instances), case (30; 22% instances), punct (11; 8% instances), discourse (3; 2% instances), discourse:sp (2; 1% instances), acl (1; 1% instances), advmod (1; 1% instances), compound (1; 1% instances), nsubj (1; 1% instances), reparandum (1; 1% instances)

Children of DET nodes belong to 8 different parts of speech: NOUN (84; 63% instances), PART (32; 24% instances), PUNCT (11; 8% instances), INTJ (3; 2% instances), ADJ (1; 1% instances), ADV (1; 1% instances), DET (1; 1% instances), VERB (1; 1% instances)