This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home id/pos issue tracker

PUNCT: punctuation

This document is a placeholder for the language-specific documentation for PUNCT.


Treebank Statistics (UD_Indonesian)

There are 1 PUNCT lemmas (6%), 39 PUNCT types (0%) and 18228 PUNCT tokens (15%). Out of 16 observed tags, the rank of PUNCT is: 12 in number of lemmas, 12 in number of types and 3 in number of tokens.

The 10 most frequent PUNCT lemmas: _

The 10 most frequent PUNCT types: ,, ., -, ), (, ?, “, :, ‘, ;

The 10 most frequent ambiguous lemmas: _ (NOUN 27313, PROPN 22844, PUNCT 18228, VERB 13257, ADP 12019, ADV 4760, ADJ 4574, PRON 4397, NUM 4386, DET 3963, CONJ 3659, SCONJ 1475, PART 590, SYM 418, X 39, AUX 1)

The 10 most frequent ambiguous types: . (PUNCT 5637, PROPN 1), (PUNCT 161, PROPN 2, NOUN 1), (PUNCT 53, NUM 1), ’’ (PUNCT 14, PROPN 2, NOUN 1), (PUNCT 11, NOUN 1), ~ (PUNCT 2, ADJ 1), &nbsp (NOUN 1, PUNCT 1, X 1), banyak (ADV 86, DET 33, ADJ 2, PUNCT 1, NOUN 1), habis (ADJ 2, PUNCT 1, ADV 1), mengenai (ADP 30, VERB 6, PUNCT 1)

Morphology

The form / lemma ratio of PUNCT is 39.000000 (the average of all parts of speech is 1437.312500).

The 1st highest number of forms (39) was observed with the lemma “_”: !, “, &nbsp, ‘, ‘’, (, ), ,, -, –, ., …, :, ;, ?, [, \, ], _, , `, banyak, habis, mengenai, ul, |, ~, ·, ẹ, –, —, “, ”, …, ′, ‹, ›, 尹, (.

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 9 different relations: punct (18212; 100% instances), dep (6; 0% instances), compound (4; 0% instances), advcl (1; 0% instances), ccomp (1; 0% instances), det (1; 0% instances), dobj (1; 0% instances), mwe (1; 0% instances), root (1; 0% instances)

Parents of PUNCT nodes belong to 17 different parts of speech: VERB (7340; 40% instances), NOUN (4858; 27% instances), PROPN (4752; 26% instances), NUM (552; 3% instances), ADJ (288; 2% instances), ADV (136; 1% instances), PRON (76; 0% instances), SYM (57; 0% instances), DET (44; 0% instances), ADP (35; 0% instances), CONJ (30; 0% instances), SCONJ (29; 0% instances), X (11; 0% instances), PART (9; 0% instances), PUNCT (9; 0% instances), AUX (1; 0% instances), ROOT (1; 0% instances)

18204 (100%) PUNCT nodes are leaves.

19 (0%) PUNCT nodes have one child.

1 (0%) PUNCT nodes have two children.

4 (0%) PUNCT nodes have three or more children.

The highest child degree of a PUNCT node is 8.

Children of PUNCT nodes are attached using 12 different relations: name (11; 28% instances), punct (9; 23% instances), nummod (5; 13% instances), dep (4; 10% instances), conj (2; 5% instances), mwe (2; 5% instances), advmod (1; 3% instances), compound (1; 3% instances), dobj (1; 3% instances), mark (1; 3% instances), nmod (1; 3% instances), nsubj (1; 3% instances)

Children of PUNCT nodes belong to 9 different parts of speech: PROPN (12; 31% instances), PUNCT (9; 23% instances), NOUN (7; 18% instances), NUM (5; 13% instances), X (2; 5% instances), ADJ (1; 3% instances), ADV (1; 3% instances), SCONJ (1; 3% instances), SYM (1; 3% instances)


PUNCT in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]