X
: other
Definition
The X
tag is used for words that for some reason cannot be assigned a real part-of-speech category.
In Slovenian UD Treebank, this tag is mostly used for cases of code-switching where it was not meaningful to analyze the intervening language, such as Europe of knowledge, La connaissance de soi, Bundesvereinigung det Deutschen Arbeitgeberverbände. In cases where foreign-language sequences include both foreign and loan words, only foreign words are assigned the X
tag, as in The Life of Brian, where both Life and Brian are marked as NOUN and PROPN respectively.
Other subcategories marked with X
include abbreviations with dots (dr.), URL addresses (www.radenska.si), news author abbreviations (sta) and tokens with alpha-numerical combinations (6230i).
Conversion from JOS
All tokens with tag Residual are converted to X
. Additionally, all abreviations are also converted to X
.
Treebank Statistics (UD_Slovenian)
There are 164 X
lemmas (1%), 165 X
types (1%) and 339 X
tokens (0%).
Out of 16 observed tags, the rank of X
is: 7 in number of lemmas, 9 in number of types and 15 in number of tokens.
The 10 most frequent X
lemmas: dr., t., d., sv., P., i., of, oz., the, M.
The 10 most frequent X
types: dr., t., d., sv., P., i., of, oz., the, M.
The 10 most frequent ambiguous lemmas: V. (X 3, NUM 1), da (SCONJ 1772, PART 8, X 2), les (NOUN 9, X 2), a (CONJ 96, ADV 2, X 1), do (ADP 353, X 1), in (CONJ 3242, ADV 5, X 1), life (NOUN 1, X 1), on (PRON 1561, X 1), pa (CONJ 957, X 1)
The 10 most frequent ambiguous types: de (X 5, VERB 1), sta (AUX 165, VERB 36, X 4), V. (X 3, NUM 1), Les (PROPN 2, X 2), da (SCONJ 1726, VERB 9, X 2, PART 1), mu (PRON 158, X 2), A (CONJ 31, NOUN 7, ADV 1, X 1), Art (X 1, PROPN 1), Life (NOUN 1, X 1), National (PROPN 2, X 1)
- de
- sta
- V.
- Les
- da
- mu
- A
- Art
- Life
- National
- PROPN 2: Ljubiteljev konjeniškega športa je namreč v Angliji še vedno veliko in za neposredni prenos dirke Grand National jih niso želeli prikrajšati .
- X 1: Volilna taktika Pauline Howard utegne njegovo koalicijo , sestavljeno iz torijcev ( Liberal Party ) in kmetovalcev ( National Party ) veljati precej marginalnih sedežev na podeželju .
Morphology
The form / lemma ratio of X
is 1.006098 (the average of all parts of speech is 1.894262).
The 1st highest number of forms (2) was observed with the lemma “european”: EUROPEAN, European.
The 2nd highest number of forms (1) was observed with the lemma “18f”: 18F.
The 3rd highest number of forms (1) was observed with the lemma “A.”: A..
X
occurs with 1 features: Foreign (110; 32% instances)
X
occurs with 1 feature-value pairs: Foreign=Foreign
X
occurs with 2 feature combinations.
The most frequent feature combination is _
(229 tokens).
Examples: dr., t., d., sv., P., i., oz., M., j., o.
Relations
X
nodes are attached to their parents using 13 different relations: nmod (142; 42% instances), foreign (57; 17% instances), root (57; 17% instances), mwe (19; 6% instances), name (11; 3% instances), nsubj (10; 3% instances), appos (9; 3% instances), advmod (7; 2% instances), dobj (7; 2% instances), amod (6; 2% instances), aux (6; 2% instances), cc (4; 1% instances), conj (4; 1% instances)
Parents of X
nodes belong to 11 different parts of speech: X (109; 32% instances), NOUN (95; 28% instances), ROOT (57; 17% instances), PROPN (42; 12% instances), VERB (27; 8% instances), ADJ (4; 1% instances), ADV (1; 0% instances), NUM (1; 0% instances), PRON (1; 0% instances), PUNCT (1; 0% instances), SCONJ (1; 0% instances)
216 (64%) X
nodes are leaves.
42 (12%) X
nodes have one child.
54 (16%) X
nodes have two children.
27 (8%) X
nodes have three or more children.
The highest child degree of a X
node is 8.
Children of X
nodes are attached using 12 different relations: punct (114; 44% instances), foreign (57; 22% instances), nmod (36; 14% instances), mwe (18; 7% instances), amod (9; 3% instances), case (8; 3% instances), name (5; 2% instances), conj (4; 2% instances), appos (3; 1% instances), cc (3; 1% instances), acl (2; 1% instances), nummod (1; 0% instances)
Children of X
nodes belong to 11 different parts of speech: PUNCT (114; 44% instances), X (109; 42% instances), ADJ (9; 3% instances), PROPN (7; 3% instances), ADP (6; 2% instances), NOUN (6; 2% instances), CONJ (3; 1% instances), SCONJ (2; 1% instances), VERB (2; 1% instances), NUM (1; 0% instances), PRON (1; 0% instances)
X in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]