X
: other
Definition
The tag X is used for words that for some reason cannot be assigned a real part-of-speech category.
In Estonian UD v 1.3 foreign words are tagged as X
. However, „fresh” loanwords that are part of an Estonian sentence, are not X
but are tagged according to their word class.
Examples
In the following sentence, the wordforms you, know, what, I, mean are tagged as X
:
Viimane pole küll mingi eriline näitaja, kuid … you know what I mean.
Treebank Statistics (UD_Estonian)
There are 61 X
lemmas (0%), 62 X
types (0%) and 90 X
tokens (0%).
Out of 15 observed tags, the rank of X
is: 10 in number of lemmas, 12 in number of types and 15 in number of tokens.
The 10 most frequent X
lemmas: I, of, in, tõ, versus, Proopusk, is, jesh, mõ, ne
The 10 most frequent X
types: I, of, in, tõ, versus, Proopusk, jesh, mõ, ne, sol
The 10 most frequent ambiguous lemmas: sol (X 2, NOUN 1), ? (PUNCT 902, X 1), KGB (NOUN 9, X 1), Mr (NOUN 1, X 1), cm (NOUN 15, X 1), km/h (ADV 13, X 1), moi (X 1, NOUN 1), ruutu (X 1, NOUN 1)
The 10 most frequent ambiguous types: ? (PUNCT 902, X 1), KGB (NOUN 9, X 1), Mr (X 1, NOUN 1), cm (NOUN 15, X 1), km/h (ADV 13, X 1), ruutu (NOUN 4, X 1)
- ?
- KGB
- Mr
- cm
- km/h
- ruutu
Morphology
The form / lemma ratio of X
is 1.016393 (the average of all parts of speech is 1.839644).
The 1st highest number of forms (2) was observed with the lemma “is”: ‘is, is.
The 2nd highest number of forms (1) was observed with the lemma “?”: ?.
The 3rd highest number of forms (1) was observed with the lemma “ATM”: ATMi.
X
occurs with 6 features: Foreign (39; 43% instances), Abbr (37; 41% instances), Case (11; 12% instances), Number (11; 12% instances), NumForm (8; 9% instances), NumType (8; 9% instances)
X
occurs with 9 feature-value pairs: Abbr=Yes
, Case=Ade
, Case=Gen
, Case=Nom
, Foreign=Yes
, NumForm=Roman
, NumType=Ord
, Number=Plur
, Number=Sing
X
occurs with 11 feature combinations.
The most frequent feature combination is Foreign=Yes
(39 tokens).
Examples: tõ, Proopusk, jesh, stupai, ?, Ili, Opjat, Pjaanitsa, Prosniiss, Spatt
Relations
X
nodes are attached to their parents using 10 different relations: nmod (32; 36% instances), foreign (25; 28% instances), root (11; 12% instances), amod (7; 8% instances), list (5; 6% instances), cc (3; 3% instances), advcl (2; 2% instances), name (2; 2% instances), nsubj (2; 2% instances), parataxis (1; 1% instances)
Parents of X
nodes belong to 9 different parts of speech: X (25; 28% instances), NOUN (18; 20% instances), PROPN (17; 19% instances), VERB (14; 16% instances), ROOT (11; 12% instances), ADJ (2; 2% instances), AUX (1; 1% instances), INTJ (1; 1% instances), NUM (1; 1% instances)
49 (54%) X
nodes are leaves.
19 (21%) X
nodes have one child.
11 (12%) X
nodes have two children.
11 (12%) X
nodes have three or more children.
The highest child degree of a X
node is 9.
Children of X
nodes are attached using 12 different relations: punct (31; 36% instances), foreign (25; 29% instances), nmod (14; 16% instances), list (5; 6% instances), nummod (3; 3% instances), appos (2; 2% instances), discourse (2; 2% instances), amod (1; 1% instances), cc (1; 1% instances), conj (1; 1% instances), cop (1; 1% instances), nsubj:cop (1; 1% instances)
Children of X
nodes belong to 11 different parts of speech: PUNCT (31; 36% instances), X (25; 29% instances), PROPN (11; 13% instances), NOUN (8; 9% instances), NUM (3; 3% instances), ADV (2; 2% instances), INTJ (2; 2% instances), VERB (2; 2% instances), ADJ (1; 1% instances), CONJ (1; 1% instances), PRON (1; 1% instances)
X in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]