home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-Taiga: POS Tags: X

There are 3270 X lemmas (6%), 3326 X types (2%) and 5823 X tokens (0%). Out of 17 observed tags, the rank of X is: 5 in number of lemmas, 5 in number of types and 15 in number of tokens.

The 10 most frequent X lemmas: _, mademoiselle, а, о, maman, m-lle, у, с, mesdames, rt

The 10 most frequent X types: то, mademoiselle, а, о, maman, m-lle, у, с, RT, mesdames

The 10 most frequent ambiguous lemmas: _ (X 261, PUNCT 1), а (CCONJ 10443, X 83, INTJ 74, NOUN 44, PART 31, ADP 1, SCONJ 1), о (ADP 3988, INTJ 379, X 72, NOUN 26), у (ADP 5641, X 38, NOUN 8, INTJ 2), с (ADP 14257, X 31, PART 27, NOUN 6), з (X 29, NOUN 6), и (CCONJ 47408, PART 5895, X 27, NOUN 26), е (NOUN 34, X 19), л (X 22, NOUN 2), к (ADP 7652, X 18, NOUN 17)

The 10 most frequent ambiguous types: то (PRON 1530, SCONJ 973, PART 727, CCONJ 547, DET 421, X 130, ADV 1), а (CCONJ 6039, X 83, PART 29, NOUN 26, INTJ 25, ADP 6), о (ADP 3763, X 64, INTJ 53, NOUN 22, PART 1), у (ADP 4597, X 38, NOUN 11), с (ADP 13301, X 32, PART 27, NOUN 5, ADV 1), з (X 29, NOUN 6, NUM 1), и (CCONJ 42913, PART 5881, X 27, NOUN 19, ADP 2), е (NOUN 32, X 19, VERB 2, PART 1), бы (AUX 2124, PART 641, X 22), л (X 22, NOUN 5)

Morphology

The form / lemma ratio of X is 1.017125 (the average of all parts of speech is 2.706111).

The 1st highest number of forms (66) was observed with the lemma “_”: ntynoginsk, Виллу, а, ааа, б, больше, бы, высока, где, давно, до, долго, еле, же, живание, за, зачем, их, йца, как, какого, капельным, когда, кому, либо, м, мало, место, ми, мимо, много, моему, на, небудь, нибудь, нибуть, ник, но, ном, ось, охота, очти, плохо, под, позна, помнить, прежнему, приятно, прочь, ранее, редактировать, розки, российские, руководители, с, счет, счёт, та, таки, то, тойный, чего, чем, четом, что, яземски.

The 2nd highest number of forms (4) was observed with the lemma “goeswith”: дру, лов, орасов, шийся.

The 3rd highest number of forms (2) was observed with the lemma “A”: A, A..

X occurs with 3 features: Foreign (3930; 67% instances), Abbr (38; 1% instances), ExtPos (5; 0% instances)

X occurs with 5 feature-value pairs: Abbr=Yes, ExtPos=ADP, ExtPos=NOUN, ExtPos=VERB, Foreign=Yes

X occurs with 7 feature combinations. The most frequent feature combination is Foreign=Yes (3902 tokens). Examples: mademoiselle, maman, m-lle, RT, mesdames, a, b, picture, i, la

Relations

X nodes are attached to their parents using 31 different relations: flat:foreign (1260; 22% instances), conj (1021; 18% instances), appos (811; 14% instances), parataxis (582; 10% instances), root (549; 9% instances), nsubj (359; 6% instances), goeswith (261; 4% instances), nmod (205; 4% instances), obl (205; 4% instances), list (169; 3% instances), vocative (121; 2% instances), obj (86; 1% instances), xcomp (37; 1% instances), compound (30; 1% instances), flat (28; 0% instances), orphan (26; 0% instances), iobj (16; 0% instances), dep (8; 0% instances), flat:name (8; 0% instances), amod (7; 0% instances), nsubj:pass (6; 0% instances), fixed (5; 0% instances), case (4; 0% instances), advcl (3; 0% instances), cc (3; 0% instances), discourse (3; 0% instances), flat:goeswith (3; 0% instances), reparandum (3; 0% instances), parataxis:discourse (2; 0% instances), acl (1; 0% instances), acl:relcl (1; 0% instances)

Parents of X nodes belong to 16 different parts of speech: X (2506; 43% instances), NOUN (1399; 24% instances), VERB (840; 14% instances), (549; 9% instances), ADJ (143; 2% instances), ADV (110; 2% instances), PRON (80; 1% instances), DET (60; 1% instances), PROPN (41; 1% instances), NUM (28; 0% instances), SCONJ (22; 0% instances), PART (14; 0% instances), ADP (12; 0% instances), SYM (9; 0% instances), CCONJ (5; 0% instances), INTJ (5; 0% instances)

2439 (42%) X nodes are leaves.

869 (15%) X nodes have one child.

883 (15%) X nodes have two children.

1632 (28%) X nodes have three or more children.

The highest child degree of a X node is 42.

Children of X nodes are attached using 34 different relations: punct (4975; 49% instances), flat:foreign (1255; 12% instances), conj (1023; 10% instances), list (601; 6% instances), parataxis (518; 5% instances), case (396; 4% instances), amod (371; 4% instances), appos (279; 3% instances), cc (267; 3% instances), nmod (108; 1% instances), nsubj (70; 1% instances), advmod (54; 1% instances), compound (47; 0% instances), flat (38; 0% instances), acl (24; 0% instances), acl:relcl (23; 0% instances), orphan (22; 0% instances), det (19; 0% instances), parataxis:discourse (18; 0% instances), discourse (15; 0% instances), flat:name (11; 0% instances), obl (11; 0% instances), nummod:gov (10; 0% instances), mark (8; 0% instances), vocative (8; 0% instances), expl (5; 0% instances), fixed (5; 0% instances), nummod (5; 0% instances), cop (4; 0% instances), advcl (2; 0% instances), ccomp (1; 0% instances), dep (1; 0% instances), iobj (1; 0% instances), obj (1; 0% instances)

Children of X nodes belong to 17 different parts of speech: PUNCT (4975; 49% instances), X (2506; 25% instances), NOUN (577; 6% instances), ADJ (472; 5% instances), ADP (354; 3% instances), VERB (331; 3% instances), CCONJ (260; 3% instances), PROPN (204; 2% instances), SYM (136; 1% instances), NUM (109; 1% instances), ADV (87; 1% instances), PRON (59; 1% instances), PART (40; 0% instances), SCONJ (39; 0% instances), DET (32; 0% instances), INTJ (11; 0% instances), AUX (4; 0% instances)