SYM
: symbol
Definition
A symbol is a word-like entity that differs from ordinary words by form, function, or both.
Many symbols are or contain special non-alphanumeric characters, similarly to punctuation. What makes them different from punctuation is that they can be substituted by normal words. This involves all currency symbols, e.g. $ 75 is identical to seventy-five dollars.
Mathematical operators form another group of symbols.
Another group of symbols is emoticons and emoji.
Strings that consists entirely of alphanumeric characters are not
symbols but they may be proper nouns: 130XE, DC10; others
may be tagged PROPN
(rather than SYM
) even if they contain special
characters: DC-10.
Similarly, abbreviations for single words are not symbols but are assigned the part of speech
of the full form. For example, Mr. (mister), kg (kilogram), km (kilometr), dr (doktor)
should be tagged nouns.
Acronyms for proper names such as OSN and NATO should be tagged as proper nouns.
Characters used as bullets in itemized lists (•, ‣) are not symbols, they are punctuation.
Examples
- $, %, §, ©
- +, −, ×, ÷, =, <, >
- :), ♥‿♥, 😝
- john.doe@universal.org, http://universaldependencies.org/, 1-800-COMPANY
Diffs
Prague Dependency Treebank
The PDT texts are from the early 1990s and there are no e-mail addresses.
If they were there, the PDT tokenization rules would break them up on all dots and at signs.
The same holds for telephone numbers. For example,
tel.: (05) 4321 6014 is analyzed as eight tokens (NOUN PUNCT PUNCT PUNCT NUM PUNCT NUM NUM
).
SYM in other languages: [cs] [cy] [da] [en] [es] [et] [fi] [fr] [ga] [grc] [hy] [it] [ja] [kk] [ky] [no] [pt] [ru] [sl] [sv] [tr] [tt] [uk] [u] [urj] [yue] [zh]