PUNCT: punctuation
Definition
Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.
Punctuation is not taken to include logograms such as ☉ and ☽, which are instead tagged as SYM.
Note, that there is infixed punctuation (exclamation, emphasis and question marks). We refer to such cases as multiword tokens, as in ի՞նչ “what?”, which become two tokens, ինչ and ՞ (for more details see the tokenization page).
Examples
- Period: ։
- Comma: ,
- Armenian comma: ՝
- Parentheses: ()
- Quotation mark: «»
- Exclamation mark: ՜
- Question mark։ ՞
- Emphasis mark, Acute accent: ՛
- Аpostrophe mark: ’
PUNCT in other languages: [axm] [bej] [bg] [ca] [cs] [cy] [da] [el] [en] [es] [et] [fi] [fr] [ga] [grc] [hbo] [hy] [hyw] [it] [ja] [ka] [kk] [kpv] [ky] [myv] [naq] [no] [oge] [pt] [ru] [sl] [sv] [tr] [tt] [uk] [u] [urj] [xcl] [xmf] [yue] [zh]