PUNCT
: punctuation
Definition
Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text, including bullets in itemized lists.
Examples
- . ? ! …
- , : ;
- ( ) { [ ]
- ” » « ‘ “ ” ’ ‘
- / _
- • * —
Conversion from JOS
The list of characters in ssj500k treebank has been manually divided into subgroups of PUNCT
and SYM. Note that some characters display characteristics of both POS categories, such as asterisk or dash-like characters that can either function as mathematical operators (SYM
) or bullets in itemized lists (PUNCT
). In case of such ambiguity, the more common function was chosen.
Treebank Statistics (UD_Slovenian)
There are 21 PUNCT
lemmas (0%), 21 PUNCT
types (0%) and 18555 PUNCT
tokens (13%).
Out of 16 observed tags, the rank of PUNCT
is: 14 in number of lemmas, 15 in number of types and 2 in number of tokens.
The 10 most frequent PUNCT
lemmas: ,, ., “, -, (, ), ?, », :, «
The 10 most frequent PUNCT
types: ,, ., “, -, (, ), ?, », :, «
The 10 most frequent ambiguous lemmas:
The 10 most frequent ambiguous types:
Morphology
The form / lemma ratio of PUNCT
is 1.000000 (the average of all parts of speech is 1.894262).
The 1st highest number of forms (1) was observed with the lemma “!”: !.
The 2nd highest number of forms (1) was observed with the lemma “””: ”.
The 3rd highest number of forms (1) was observed with the lemma “’”: ’.
PUNCT
does not occur with any features.
Relations
PUNCT
nodes are attached to their parents using 2 different relations: sl-dep/punct (18553; 100% instances), sl-dep/root (2; 0% instances)
Parents of PUNCT
nodes belong to 14 different parts of speech: VERB (12871; 69% instances), ADJ (2356; 13% instances), NOUN (2347; 13% instances), PROPN (381; 2% instances), NUM (178; 1% instances), ADV (120; 1% instances), X (114; 1% instances), PRON (84; 0% instances), PART (71; 0% instances), INTJ (20; 0% instances), CONJ (7; 0% instances), ADP (3; 0% instances), ROOT (2; 0% instances), PUNCT (1; 0% instances)
18551 (100%) PUNCT
nodes are leaves.
4 (0%) PUNCT
nodes have one child.
The highest child degree of a PUNCT
node is 1.
Children of PUNCT
nodes are attached using 3 different relations: sl-dep/case (2; 50% instances), sl-dep/nmod (1; 25% instances), sl-dep/punct (1; 25% instances)
Children of PUNCT
nodes belong to 3 different parts of speech: ADP (2; 50% instances), PUNCT (1; 25% instances), X (1; 25% instances)
Treebank Statistics (UD_Slovenian-SST)
There are 3 PUNCT
lemmas (0%), 3 PUNCT
types (0%) and 542 PUNCT
tokens (2%).
Out of 16 observed tags, the rank of PUNCT
is: 15 in number of lemmas, 16 in number of types and 15 in number of tokens.
The 10 most frequent PUNCT
lemmas: ?, …, !
The 10 most frequent PUNCT
types: ?, …, !
The 10 most frequent ambiguous lemmas:
The 10 most frequent ambiguous types:
Morphology
The form / lemma ratio of PUNCT
is 1.000000 (the average of all parts of speech is 1.575031).
The 1st highest number of forms (1) was observed with the lemma “!”: !.
The 2nd highest number of forms (1) was observed with the lemma “?”: ?.
The 3rd highest number of forms (1) was observed with the lemma “…”: ….
PUNCT
does not occur with any features.
Relations
PUNCT
nodes are attached to their parents using 1 different relations: sl-dep/punct (542; 100% instances)
Parents of PUNCT
nodes belong to 11 different parts of speech: VERB (289; 53% instances), NOUN (64; 12% instances), ADV (48; 9% instances), PRON (45; 8% instances), INTJ (27; 5% instances), PART (26; 5% instances), ADJ (24; 4% instances), PROPN (10; 2% instances), X (6; 1% instances), NUM (2; 0% instances), SCONJ (1; 0% instances)
542 (100%) PUNCT
nodes are leaves.
The highest child degree of a PUNCT
node is 0.
PUNCT in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]