This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home ru/pos issue tracker

SYM: symbol

Definition

A symbol is a word-like entity that differs from ordinary words by form, function, or both.

Many symbols are or contain special non-alphanumeric characters, similarly to punctuation. What makes them different from punctuation is that they can be substituted by normal words. This involves all currency symbols, e.g. $ 75 is identical to seventy-five dollars.

Mathematical operators form another group of symbols.

Another group of symbols is emoticons and emoji.

Strings that consists entirely of alphanumeric characters are not symbols but they may be proper nouns: 130XE, DC10; others may be tagged PROPN (rather than SYM) even if they contain special characters: DC-10. Similarly, abbreviations for single words are not symbols but are assigned the part of speech of the full form. For example, Mr. (mister), kg (kilogram), km (kilometr), dr (doktor) should be tagged nouns. Acronyms for proper names such as OSN and NATO should be tagged as proper nouns.

Characters used as bullets in itemized lists (•, ‣) are not symbols, they are punctuation.

Examples


Treebank Statistics (UD_Russian)

There are 17 SYM lemmas (0%), 16 SYM types (0%) and 184 SYM tokens (0%). Out of 16 observed tags, the rank of SYM is: 14 in number of lemmas, 14 in number of types and 15 in number of tokens.

The 10 most frequent SYM lemmas: ПРОЦЕНТ-ЗНАК, /, %, \, +, *, °, =, ×, ^

The 10 most frequent SYM types: %, /, \, +, *, °, =, ×, ^, $

The 10 most frequent ambiguous lemmas: (PUNCT 4, SYM 2)

The 10 most frequent ambiguous types: (PUNCT 4, SYM 2)

Morphology

The form / lemma ratio of SYM is 0.941176 (the average of all parts of speech is 1.591757).

The 1st highest number of forms (1) was observed with the lemma “$”: $.

The 2nd highest number of forms (1) was observed with the lemma “%”: %.

The 3rd highest number of forms (1) was observed with the lemma “*”: *.

SYM does not occur with any features.

Relations

SYM nodes are attached to their parents using 13 different relations: punct (90; 49% instances), nmod (37; 20% instances), appos (10; 5% instances), dobj (9; 5% instances), conj (8; 4% instances), goeswith (8; 4% instances), nsubj (8; 4% instances), advmod (4; 2% instances), parataxis (3; 2% instances), cc (2; 1% instances), remnant (2; 1% instances), root (2; 1% instances), acl (1; 1% instances)

Parents of SYM nodes belong to 10 different parts of speech: NOUN (67; 36% instances), VERB (44; 24% instances), SYM (21; 11% instances), PROPN (14; 8% instances), NUM (12; 7% instances), ADV (11; 6% instances), ADJ (8; 4% instances), ADP (3; 2% instances), CONJ (2; 1% instances), ROOT (2; 1% instances)

96 (52%) SYM nodes are leaves.

24 (13%) SYM nodes have one child.

30 (16%) SYM nodes have two children.

34 (18%) SYM nodes have three or more children.

The highest child degree of a SYM node is 29.

Children of SYM nodes are attached using 15 different relations: punct (57; 24% instances), nmod (45; 19% instances), nummod:gov (45; 19% instances), nummod (35; 15% instances), case (27; 11% instances), conj (8; 3% instances), nsubj (8; 3% instances), appos (4; 2% instances), remnant (3; 1% instances), amod (2; 1% instances), discourse (2; 1% instances), list (2; 1% instances), cc (1; 0% instances), dobj (1; 0% instances), parataxis (1; 0% instances)

Children of SYM nodes belong to 11 different parts of speech: NUM (83; 34% instances), NOUN (48; 20% instances), PUNCT (46; 19% instances), ADP (28; 12% instances), SYM (21; 9% instances), ADV (4; 2% instances), PRON (4; 2% instances), PART (2; 1% instances), PROPN (2; 1% instances), VERB (2; 1% instances), CONJ (1; 0% instances)


Treebank Statistics (UD_Russian-SynTagRus)

There are 8 SYM lemmas (0%), 8 SYM types (0%) and 942 SYM tokens (0%). Out of 17 observed tags, the rank of SYM is: 16 in number of lemmas, 17 in number of types and 15 in number of tokens.

The 10 most frequent SYM lemmas: процент-знак, доллар-знак, номер-знак, градус-знак, евро-знак, плюс, равно-знак, +

The 10 most frequent SYM types: %, $, №, °, €, +, =, №№

The 10 most frequent ambiguous lemmas: номер-знак (SYM 67, PROPN 1), плюс (ADP 34, NOUN 23, ADV 5, SYM 5, PART 1)

The 10 most frequent ambiguous types:

Morphology

The form / lemma ratio of SYM is 1.000000 (the average of all parts of speech is 2.665758).

The 1st highest number of forms (2) was observed with the lemma “номер-знак”: №, №№.

The 2nd highest number of forms (1) was observed with the lemma “+”: +.

The 3rd highest number of forms (1) was observed with the lemma “градус-знак”: °.

SYM does not occur with any features.

Relations

SYM nodes are attached to their parents using 15 different relations: nmod (571; 61% instances), nsubj (129; 14% instances), parataxis (71; 8% instances), conj (60; 6% instances), nummod:entity (40; 4% instances), iobj (17; 2% instances), root (17; 2% instances), nsubjpass (13; 1% instances), appos (8; 1% instances), dobj (6; 1% instances), case (5; 1% instances), advcl (2; 0% instances), advmod (1; 0% instances), mwe (1; 0% instances), nmod:agent (1; 0% instances)

Parents of SYM nodes belong to 9 different parts of speech: VERB (521; 55% instances), NOUN (257; 27% instances), SYM (43; 5% instances), ADJ (34; 4% instances), ADV (30; 3% instances), PROPN (22; 2% instances), NUM (17; 2% instances), ROOT (17; 2% instances), CONJ (1; 0% instances)

21 (2%) SYM nodes are leaves.

155 (16%) SYM nodes have one child.

451 (48%) SYM nodes have two children.

315 (33%) SYM nodes have three or more children.

The highest child degree of a SYM node is 8.

Children of SYM nodes are attached using 18 different relations: nummod (832; 39% instances), nmod (400; 19% instances), case (370; 17% instances), punct (166; 8% instances), nummod:gov (118; 6% instances), advmod (81; 4% instances), conj (54; 3% instances), cc (36; 2% instances), parataxis (27; 1% instances), nsubj (25; 1% instances), amod (12; 1% instances), cop (6; 0% instances), det (5; 0% instances), acl:relcl (4; 0% instances), appos (2; 0% instances), mark (2; 0% instances), neg (2; 0% instances), foreign (1; 0% instances)

Children of SYM nodes belong to 16 different parts of speech: NUM (833; 39% instances), NOUN (463; 22% instances), ADP (370; 17% instances), PUNCT (166; 8% instances), ADV (86; 4% instances), PROPN (43; 2% instances), SYM (43; 2% instances), ADJ (40; 2% instances), CONJ (35; 2% instances), VERB (26; 1% instances), PART (18; 1% instances), SCONJ (6; 0% instances), AUX (5; 0% instances), DET (5; 0% instances), PRON (3; 0% instances), X (1; 0% instances)


SYM in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]