home edit page issue tracker

This page pertains to UD version 2.

PUNCT: punctuation


Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text. Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.


Conversion from JOS

The list of characters in ssj500k treebank has been manually divided into subgroups of PUNCT and SYM. Note that some characters display characteristics of both POS categories, such as asterisk or dash-like characters that can either function as mathematical operators (SYM) or bullets in itemized lists (PUNCT). In case of such ambiguity, the more common function was chosen.

PUNCT in other languages: [bej] [bg] [ca] [cs] [cy] [da] [el] [en] [es] [et] [fi] [fr] [ga] [grc] [hy] [hyw] [it] [ja] [ka] [kk] [kpv] [ky] [myv] [no] [pt] [ru] [sl] [sv] [tr] [tt] [uk] [u] [urj] [xcl] [yue] [zh]