This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home tr/pos issue tracker

PUNCT: punctuation

Definition

Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.

Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.

Examples

References


Treebank Statistics (UD_Turkish)

There are 13 PUNCT lemmas (0%), 13 PUNCT types (0%) and 10424 PUNCT tokens (18%). Out of 14 observed tags, the rank of PUNCT is: 12 in number of lemmas, 13 in number of types and 3 in number of tokens.

The 10 most frequent PUNCT lemmas: ., ,, “, …, ?, :, -, ;, !, )

The 10 most frequent PUNCT types: ., ,, “, …, ?, :, -, ;, !, )

The 10 most frequent ambiguous lemmas:

The 10 most frequent ambiguous types: ? (PUNCT 231, PRON 3, PROPN 2)

Morphology

The form / lemma ratio of PUNCT is 1.000000 (the average of all parts of speech is 2.815350).

The 1st highest number of forms (1) was observed with the lemma “!”: !.

The 2nd highest number of forms (1) was observed with the lemma “””: .

The 3rd highest number of forms (1) was observed with the lemma “’”: .

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 8 different relations: punct (10231; 98% instances), conj (136; 1% instances), root (39; 0% instances), cc (12; 0% instances), dobj (3; 0% instances), advmod:emph (1; 0% instances), nmod:poss (1; 0% instances), nsubj (1; 0% instances)

Parents of PUNCT nodes belong to 15 different parts of speech: VERB (8508; 82% instances), NOUN (927; 9% instances), ADJ (588; 6% instances), ADV (118; 1% instances), PRON (88; 1% instances), PROPN (55; 1% instances), ROOT (39; 0% instances), PUNCT (26; 0% instances), INTJ (19; 0% instances), CONJ (17; 0% instances), NUM (15; 0% instances), ADP (11; 0% instances), DET (9; 0% instances), AUX (2; 0% instances), X (2; 0% instances)

10344 (99%) PUNCT nodes are leaves.

45 (0%) PUNCT nodes have one child.

20 (0%) PUNCT nodes have two children.

15 (0%) PUNCT nodes have three or more children.

The highest child degree of a PUNCT node is 6.

Children of PUNCT nodes are attached using 12 different relations: conj (49; 36% instances), nsubj (19; 14% instances), punct (18; 13% instances), nmod (17; 12% instances), dobj (9; 7% instances), advmod (7; 5% instances), amod (6; 4% instances), cc (3; 2% instances), discourse (3; 2% instances), acl (2; 1% instances), advmod:emph (2; 1% instances), csubj (2; 1% instances)

Children of PUNCT nodes belong to 11 different parts of speech: VERB (38; 28% instances), NOUN (27; 20% instances), PUNCT (26; 19% instances), ADV (12; 9% instances), ADJ (10; 7% instances), PRON (9; 7% instances), CONJ (6; 4% instances), INTJ (3; 2% instances), NUM (3; 2% instances), PROPN (2; 1% instances), DET (1; 1% instances)


PUNCT in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]