PUNCT

home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

`PUNCT`: punctuation

Definition

Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.

Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM. (Hint: if it corresponds to a word that you pronounce, such as dollar or percent, it is SYM and not PUNCT.)

Spoken corpora contain symbols representing pauses, laughter and other sounds; we treat them as punctuation, too. In these cases it is even not required that all characters of the token are non-alphabetical. One can represent a pause using a special character such as #, or using some more descriptive coding such as [:pause].

Examples

Period: .
Comma: ,
Parentheses: ()
Bullets in itemized lists: •, ‣

References

Wikipedia

PUNCT in other languages: [axm] [bej] [bg] [ca] [cs] [cy] [da] [el] [en] [es] [et] [fi] [fr] [ga] [grc] [hbo] [hy] [hyw] [it] [ja] [ka] [kk] [kpv] [ky] [myv] [naq] [no] [oge] [pal] [pt] [ru] [sl] [sv] [tr] [tt] [u] [uk] [urj] [xcl] [xmf] [yue] [zh]

PUNCT: punctuation

Definition

Examples

References

`PUNCT`: punctuation