This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home et/pos issue tracker

X: other

Definition

The tag X is used for words that for some reason cannot be assigned a real part-of-speech category.
In Estonian UD v 1.3 foreign words are tagged as X. However, „fresh” loanwords that are part of an Estonian sentence, are not X but are tagged according to their word class.

Examples
In the following sentence, the wordforms you, know, what, I, mean are tagged as X:
Viimane pole küll mingi eriline näitaja, kuid … you know what I mean.


Treebank Statistics (UD_Estonian)

There are 61 X lemmas (0%), 62 X types (0%) and 90 X tokens (0%). Out of 15 observed tags, the rank of X is: 10 in number of lemmas, 12 in number of types and 15 in number of tokens.

The 10 most frequent X lemmas: I, of, in, tõ, versus, Proopusk, is, jesh, mõ, ne

The 10 most frequent X types: I, of, in, tõ, versus, Proopusk, jesh, mõ, ne, sol

The 10 most frequent ambiguous lemmas: sol (X 2, NOUN 1), ? (PUNCT 902, X 1), KGB (NOUN 9, X 1), Mr (NOUN 1, X 1), cm (NOUN 15, X 1), km/h (ADV 13, X 1), moi (X 1, NOUN 1), ruutu (X 1, NOUN 1)

The 10 most frequent ambiguous types: ? (PUNCT 902, X 1), KGB (NOUN 9, X 1), Mr (X 1, NOUN 1), cm (NOUN 15, X 1), km/h (ADV 13, X 1), ruutu (NOUN 4, X 1)

Morphology

The form / lemma ratio of X is 1.016393 (the average of all parts of speech is 1.839644).

The 1st highest number of forms (2) was observed with the lemma “is”: ‘is, is.

The 2nd highest number of forms (1) was observed with the lemma “?”: ?.

The 3rd highest number of forms (1) was observed with the lemma “ATM”: ATMi.

X occurs with 6 features: Foreign (39; 43% instances), Abbr (37; 41% instances), Case (11; 12% instances), Number (11; 12% instances), NumForm (8; 9% instances), NumType (8; 9% instances)

X occurs with 9 feature-value pairs: Abbr=Yes, Case=Ade, Case=Gen, Case=Nom, Foreign=Yes, NumForm=Roman, NumType=Ord, Number=Plur, Number=Sing

X occurs with 11 feature combinations. The most frequent feature combination is Foreign=Yes (39 tokens). Examples: tõ, Proopusk, jesh, stupai, ?, Ili, Opjat, Pjaanitsa, Prosniiss, Spatt

Relations

X nodes are attached to their parents using 10 different relations: nmod (32; 36% instances), foreign (25; 28% instances), root (11; 12% instances), amod (7; 8% instances), list (5; 6% instances), cc (3; 3% instances), advcl (2; 2% instances), name (2; 2% instances), nsubj (2; 2% instances), parataxis (1; 1% instances)

Parents of X nodes belong to 9 different parts of speech: X (25; 28% instances), NOUN (18; 20% instances), PROPN (17; 19% instances), VERB (14; 16% instances), ROOT (11; 12% instances), ADJ (2; 2% instances), AUX (1; 1% instances), INTJ (1; 1% instances), NUM (1; 1% instances)

49 (54%) X nodes are leaves.

19 (21%) X nodes have one child.

11 (12%) X nodes have two children.

11 (12%) X nodes have three or more children.

The highest child degree of a X node is 9.

Children of X nodes are attached using 12 different relations: punct (31; 36% instances), foreign (25; 29% instances), nmod (14; 16% instances), list (5; 6% instances), nummod (3; 3% instances), appos (2; 2% instances), discourse (2; 2% instances), amod (1; 1% instances), cc (1; 1% instances), conj (1; 1% instances), cop (1; 1% instances), nsubj:cop (1; 1% instances)

Children of X nodes belong to 11 different parts of speech: PUNCT (31; 36% instances), X (25; 29% instances), PROPN (11; 13% instances), NOUN (8; 9% instances), NUM (3; 3% instances), ADV (2; 2% instances), INTJ (2; 2% instances), VERB (2; 2% instances), ADJ (1; 1% instances), CONJ (1; 1% instances), PRON (1; 1% instances)


X in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]