home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_English-EWT: POS Tags: X

There are 151 X lemmas (1%), 262 X types (1%) and 500 X tokens (0%). Out of 17 observed tags, the rank of X is: 7 in number of lemmas, 7 in number of types and 17 in number of tokens.

The 10 most frequent X lemmas: _, .doc, s, (, ), alberta, -, Analysis_0712, MEH-risk, Oct

The 10 most frequent X types: .doc, s, -, (, ), Alberta, Access, Analysis_0712, COMMUNICATIONS, MEH-risk

The 10 most frequent ambiguous lemmas: _ (X 171, PUNCT 5), s (X 10, NOUN 2, PROPN 1), ( (PUNCT 1030, X 7), ) (PUNCT 1067, X 7), - (PUNCT 1648, SYM 119, X 6), access (NOUN 32, VERB 6, X 6), and (CCONJ 6111, X 6), pricing (NOUN 13, X 6), transmission (X 6, NOUN 5), enron (PROPN 7, X 5)

The 10 most frequent ambiguous types: s (AUX 104, PART 99, X 11, PRON 7, VERB 5, NOUN 2, PROPN 1), - (PUNCT 1627, SYM 119, X 8), ( (PUNCT 1030, X 7), ) (PUNCT 1067, X 7), Alberta (X 7, PROPN 1), Oct (PROPN 8, X 6), Pricing (X 6, VERB 1), Transmission (X 6, PROPN 2, NOUN 1), a (DET 4542, ADP 7, NUM 6, NOUN 4, ADV 2, X 2, ADJ 1, AUX 1, CCONJ 1, PART 1), and (CCONJ 5915, X 6, DET 5, ADP 2)

Morphology

The form / lemma ratio of X is 1.735099 (the average of all parts of speech is 1.237686).

The 1st highest number of forms (115) was observed with the lemma “_”: -, 3-5290, @, A, Abramo@ENRON, Akin@ECT, Alatorre@ENRON, Bertone@ENRON_DEVELOPMENT, Blaine@ENRON_DEVELOPMENT, Bryngelson@AZURIX, C, COMMUNICATIONS, Castagnola@ENRON_DEVELOPMENT, Castano@EES, Delainey@ECT, Diebner@ECT, Do@ENRON_DEVELOPMENT, Dorsey@ENRON_DEVELOPMENT, E, ECT, Edison@ENRON, Forster@ENRON, Garcia@ENRON, Griffith@ENRON, Hansen@ENRON, Hopkinson@ENRON_DEVELOPMENT, Horton@ENRON, Huble@ENRON, J, Jacoby@ECT, Johnson@ENRON, Kaminski@ECT, Kaufman@ECT, Khan@TRANSREDES, Kindall@ENRON, Lamb@ENRON, Leibman@ENRON, Leigh, Luan, Mann@ENRON, Martinez@ENRON, McConnell@ECT, Montgomery@ENRON, Olsen@ENRON, P, Palmer@ENRON, Patel@ENRON, Perry@ENRON_DEVELOPMENT, Rance@ENRON, Rice@ENRON, Salinardo@ENRON, Schwartzenburg@ENRON_DEVELOPMENT, Shackleton@ECT, Stephens@ENRON, Sullivan@ENRON, W, Ward, Warner@ENRON, Williams@ENRON_DEVELOPMENT, back, cent, charged, cooked, d, day, deed, donald, dramatic, educated, ever, expose, fall, for, full, get, going, h, hill, ible, in, informed, ive, line, mail, mentioned, morning, night, notebook.url, o, one, oone, order, out, paid, perform, pixel, plenty, power, priced, r, respect, s, self, ship, side, standing, structure, t, time, to, together, u, way, were, where.

The 2nd highest number of forms (2) was observed with the lemma “al.”: al, al..

The 3rd highest number of forms (2) was observed with the lemma “enron”: ENRON, Enron.

X occurs with 2 features: Foreign (51; 10% instances), Typo (1; 0% instances)

X occurs with 2 feature-value pairs: Foreign=Yes, Typo=Yes

X occurs with 3 feature combinations. The most frequent feature combination is _ (448 tokens). Examples: .doc, s, -, (, ), Alberta, Access, Analysis_0712, COMMUNICATIONS, MEH-risk

Relations

X nodes are attached to their parents using 19 different relations: flat (206; 41% instances), goeswith (171; 34% instances), compound (39; 8% instances), root (28; 6% instances), amod (14; 3% instances), appos (14; 3% instances), case (5; 1% instances), parataxis (5; 1% instances), conj (4; 1% instances), cc (2; 0% instances), list (2; 0% instances), nmod (2; 0% instances), obl (2; 0% instances), dep (1; 0% instances), discourse (1; 0% instances), nmod:unmarked (1; 0% instances), obj (1; 0% instances), obl:unmarked (1; 0% instances), reparandum (1; 0% instances)

Parents of X nodes belong to 11 different parts of speech: X (221; 44% instances), PROPN (87; 17% instances), NOUN (82; 16% instances), (28; 6% instances), ADJ (24; 5% instances), VERB (22; 4% instances), ADV (17; 3% instances), PRON (10; 2% instances), ADP (6; 1% instances), AUX (2; 0% instances), SCONJ (1; 0% instances)

414 (83%) X nodes are leaves.

19 (4%) X nodes have one child.

17 (3%) X nodes have two children.

50 (10%) X nodes have three or more children.

The highest child degree of a X node is 12.

Children of X nodes are attached using 13 different relations: flat (201; 60% instances), punct (97; 29% instances), compound (12; 4% instances), conj (6; 2% instances), case (4; 1% instances), list (4; 1% instances), nmod (3; 1% instances), nummod (3; 1% instances), cc (2; 1% instances), nmod:unmarked (2; 1% instances), cop (1; 0% instances), nsubj (1; 0% instances), parataxis (1; 0% instances)

Children of X nodes belong to 9 different parts of speech: X (221; 66% instances), PUNCT (97; 29% instances), NOUN (9; 3% instances), ADP (3; 1% instances), NUM (2; 1% instances), VERB (2; 1% instances), ADJ (1; 0% instances), AUX (1; 0% instances), PRON (1; 0% instances)