PUNCT
: punctuation
Definition
Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text.
Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.
Examples
- Period: .
- Comma: ,
- Parentheses: ()
Diffs
Prague Dependency Treebank
The PDT texts are from the early 1990s and there are no e-mail addresses.
If they were there, the PDT tokenization rules would break them up on all dots and at signs.
The same holds for telephone numbers. For example,
tel.: (05) 4321 6014 is analyzed as eight tokens (NOUN PUNCT PUNCT PUNCT NUM PUNCT NUM NUM
).
References
PUNCT in other languages: [bej] [bg] [ca] [cs] [cy] [da] [el] [en] [es] [et] [fi] [fr] [ga] [grc] [hy] [hyw] [it] [ja] [ka] [kk] [kpv] [ky] [myv] [no] [pt] [ru] [sl] [sv] [tr] [tt] [uk] [u] [urj] [xcl] [yue] [zh]