home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Kazakh-KTB: POS Tags: NOUN

There are 1106 NOUN lemmas (42%), 2068 NOUN types (45%) and 3100 NOUN tokens (29%). Out of 17 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: ел, жыл, мемлекет, бала, жер, ж., ғасыр, орыс, қала, адам

The 10 most frequent NOUN types: ж., мемлекет, ел, орыс, басшысы, бала, елде, жылдың, жылы, қазақ

The 10 most frequent ambiguous lemmas: жыл (NOUN 62, X 2), бала (NOUN 35, X 1), қала (NOUN 30, VERB 3), адам (NOUN 29, X 1), қазақ (NOUN 25, ADJ 2), жұмыс (NOUN 23, X 6), орын (NOUN 18, X 1), бас (NOUN 17, VERB 10, ADJ 1, X 1), мал (NOUN 17, VERB 1, X 1), қыз (NOUN 15, VERB 1)

The 10 most frequent ambiguous types: бала (NOUN 13, X 1), жылы (NOUN 15, ADJ 1), қазақ (NOUN 12, ADJ 2), адам (NOUN 10, X 1), мал (NOUN 8, X 1), орын (NOUN 9, X 1), жұмыс (NOUN 7, X 6), жыл (NOUN 6, X 1), бас (NOUN 3, X 1), ағылшын (NOUN 4, ADJ 1)

Morphology

The form / lemma ratio of NOUN is 1.869801 (the average of all parts of speech is 1.747153).

The 1st highest number of forms (25) was observed with the lemma “ел”: Еліміздегі, Еліне, ел, елге, елде, елдегі, елден, елдер, елдерден, елдерді, елдері, елдерімен, елдеріміз, елдерінде, елдеріңіз, елді, елдің, елі, еліміз, елімізге, елімізде, еліміздің, елінің, еліңе, еліңіз.

The 2nd highest number of forms (17) was observed with the lemma “бала”: Балаларды, Балалардың, бала, балалар, балалардан, балалармен, балалары, балаларына, балаларынан, балама, баламды, баласы, баласын, баласына, балаға, балаң, балаңа.

The 3rd highest number of forms (12) was observed with the lemma “жыл”: жыл, жылда, жылдан, жылдар, жылдардағы, жылдардың, жылдары, жылдарына, жылдың, жылмен, жылы, жылға.

NOUN occurs with 5 features: Case (2999; 97% instances), Number[psor] (1132; 37% instances), Person[psor] (1132; 37% instances), Number (398; 13% instances), Polite (11; 0% instances)

NOUN occurs with 15 feature-value pairs: Case=Abl, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Number=Plur, Number[psor]=Plur, Number[psor]=Plur,Sing, Number[psor]=Sing, Person[psor]=1, Person[psor]=2, Person[psor]=3, Polite=Form

NOUN occurs with 55 feature combinations. The most frequent feature combination is Case=Nom (964 tokens). Examples: мемлекет, ел, орыс, қазақ, Президент, адам, бала, кісі, мал, орын

Relations

NOUN nodes are attached to their parents using 25 different relations: nsubj (679; 22% instances), obl (652; 21% instances), nmod:poss (548; 18% instances), obj (432; 14% instances), conj (188; 6% instances), nmod (165; 5% instances), root (144; 5% instances), compound (73; 2% instances), amod (52; 2% instances), appos (35; 1% instances), advcl (24; 1% instances), flat:name (17; 1% instances), parataxis (17; 1% instances), ccomp (12; 0% instances), nummod (11; 0% instances), xcomp (10; 0% instances), compound:lvc (7; 0% instances), orphan (7; 0% instances), acl (5; 0% instances), acl:relcl (5; 0% instances), iobj (5; 0% instances), vocative (4; 0% instances), clf (3; 0% instances), csubj (3; 0% instances), obl:own (2; 0% instances)

Parents of NOUN nodes belong to 9 different parts of speech: VERB (1644; 53% instances), NOUN (1023; 33% instances), ADJ (177; 6% instances), (144; 5% instances), PROPN (47; 2% instances), NUM (35; 1% instances), PRON (19; 1% instances), ADV (7; 0% instances), AUX (4; 0% instances)

982 (32%) NOUN nodes are leaves.

1336 (43%) NOUN nodes have one child.

477 (15%) NOUN nodes have two children.

305 (10%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 9.

Children of NOUN nodes are attached using 28 different relations: nmod:poss (773; 23% instances), amod (672; 20% instances), punct (436; 13% instances), det (211; 6% instances), conj (180; 5% instances), nsubj (129; 4% instances), case (113; 3% instances), nummod (111; 3% instances), acl (101; 3% instances), cop (88; 3% instances), acl:relcl (87; 3% instances), cc (86; 3% instances), compound (84; 2% instances), appos (67; 2% instances), nmod (64; 2% instances), obl (56; 2% instances), advmod (47; 1% instances), flat:name (30; 1% instances), advcl (24; 1% instances), parataxis (18; 1% instances), dep (16; 0% instances), csubj (12; 0% instances), orphan (10; 0% instances), aux (9; 0% instances), discourse (4; 0% instances), iobj (1; 0% instances), obj (1; 0% instances), vocative (1; 0% instances)

Children of NOUN nodes belong to 17 different parts of speech: NOUN (1023; 30% instances), ADJ (506; 15% instances), PUNCT (436; 13% instances), NUM (280; 8% instances), PROPN (266; 8% instances), VERB (218; 6% instances), DET (212; 6% instances), PRON (113; 3% instances), ADP (112; 3% instances), AUX (97; 3% instances), CCONJ (83; 2% instances), ADV (62; 2% instances), X (16; 0% instances), SCONJ (3; 0% instances), INTJ (2; 0% instances), PART (1; 0% instances), SYM (1; 0% instances)