home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Korean-GSD: POS Tags: PUNCT

There are 103 PUNCT lemmas (0%), 106 PUNCT types (0%) and 10362 PUNCT tokens (13%). Out of 16 observed tags, the rank of PUNCT is: 10 in number of lemmas, 10 in number of types and 4 in number of tokens.

The 10 most frequent PUNCT lemmas: ., ,, ‘, (, ), “, %, ?, !, •

The 10 most frequent PUNCT types: ., ,, ‘, (, ), “, %, ?, !, •

The 10 most frequent ambiguous lemmas: % (PUNCT 137, SYM 45), ~ (SYM 46, PUNCT 24), 이+다 (PUNCT 19, VERB 2, NOUN 1), ㎡ (PUNCT 12, SYM 5), ㎞ (SYM 7, PUNCT 5), ^ (PUNCT 3, SYM 1), ℓ (PUNCT 3, SYM 1), ㎢ (PUNCT 3, SYM 3), ㈜ (PUNCT 2, SYM 1), ㎝ (PUNCT 2, SYM 1)

The 10 most frequent ambiguous types: % (PUNCT 137, SYM 45), ~ (SYM 46, PUNCT 24), 이다 (PUNCT 14, VERB 2, NOUN 1), ㎡ (PUNCT 12, SYM 5), ㎞ (SYM 7, PUNCT 5), 다 (ADV 46, PUNCT 5, NOUN 3), ^ (PUNCT 3, SYM 1), ℓ (PUNCT 3, SYM 1), ㎢ (PUNCT 3, SYM 3), ㈜ (PUNCT 2, SYM 1)

Morphology

The form / lemma ratio of PUNCT is 1.029126 (the average of all parts of speech is 1.000681).

The 1st highest number of forms (2) was observed with the lemma “<”: <, <.

The 2nd highest number of forms (2) was observed with the lemma “이+다”: 다, 이다.

The 3rd highest number of forms (2) was observed with the lemma “이+었+다”: 였다, 이었다.

PUNCT occurs with 1 features: NumType (16; 0% instances)

PUNCT occurs with 1 feature-value pairs: NumType=Card

PUNCT occurs with 2 feature combinations. The most frequent feature combination is _ (10346 tokens). Examples: ., ,, ‘, (, ), “, %, ?, !, •

Relations

PUNCT nodes are attached to their parents using 9 different relations: punct (10299; 99% instances), cop (29; 0% instances), appos (25; 0% instances), flat (4; 0% instances), advcl (1; 0% instances), case (1; 0% instances), conj (1; 0% instances), dep (1; 0% instances), root (1; 0% instances)

Parents of PUNCT nodes belong to 16 different parts of speech: VERB (4863; 47% instances), NOUN (3035; 29% instances), ADJ (768; 7% instances), SYM (416; 4% instances), NUM (398; 4% instances), PROPN (349; 3% instances), PUNCT (275; 3% instances), ADV (171; 2% instances), AUX (32; 0% instances), ADP (26; 0% instances), DET (9; 0% instances), PRON (9; 0% instances), INTJ (5; 0% instances), CCONJ (4; 0% instances), PART (1; 0% instances), (1; 0% instances)

10059 (97%) PUNCT nodes are leaves.

269 (3%) PUNCT nodes have one child.

17 (0%) PUNCT nodes have two children.

17 (0%) PUNCT nodes have three or more children.

The highest child degree of a PUNCT node is 6.

Children of PUNCT nodes are attached using 14 different relations: punct (288; 79% instances), appos (29; 8% instances), case (14; 4% instances), flat (13; 4% instances), conj (5; 1% instances), obj (4; 1% instances), nmod (3; 1% instances), acl:relcl (2; 1% instances), nsubj (2; 1% instances), amod (1; 0% instances), cc (1; 0% instances), cop (1; 0% instances), dep (1; 0% instances), det (1; 0% instances)

Children of PUNCT nodes belong to 11 different parts of speech: PUNCT (275; 75% instances), NOUN (39; 11% instances), ADP (19; 5% instances), SYM (18; 5% instances), NUM (5; 1% instances), PROPN (3; 1% instances), VERB (2; 1% instances), ADJ (1; 0% instances), ADV (1; 0% instances), CCONJ (1; 0% instances), DET (1; 0% instances)