PUNCT: punctuation
Definition
Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.
Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.
Note that there is internal punctuation (exclamation, emphasis, and question marks). Words containing internal punctuation are treated as multiword tokens, as in ինչպէ՞ս “how?”, which is split into two tokens, ինչպէս and ՞ (for more details see the tokenization page).
Examples
- Period (Armenian full stop): ։
- Comma: ,
- Parentheses: ()
- Quotation mark: «»
- Exclamation mark: ՜
- Question mark: ՞
- Emphasis mark (acute accent): ՛
- Apostrophe: ՚
- Armenian hyphen (yentamna): ֊
PUNCT in other languages: [axm] [bej] [bg] [ca] [cs] [cy] [da] [el] [en] [es] [et] [fi] [fr] [ga] [grc] [hbo] [hy] [hyw] [it] [ja] [ka] [kk] [kpv] [ky] [myv] [naq] [no] [oge] [pal] [pt] [ru] [sl] [sv] [tr] [tt] [u] [uk] [urj] [xcl] [xmf] [yue] [zh]