home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-PDT: POS Tags: NOUN

There are 9001 NOUN lemmas (33%), 18229 NOUN types (34%) and 83173 NOUN tokens (25%). Out of 17 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: rok, strana, léta, cena, firma, doba, vláda, zákon, společnost, země

The 10 most frequent NOUN types: p, let, roku, korun, roce, Kč, r, strany, firmy, případě

The 10 most frequent ambiguous lemmas: bod (NOUN 338, PROPN 1), stát (VERB 344, NOUN 329), den (NOUN 272, X 1), místo (NOUN 222, ADP 45, ADV 6), a (CCONJ 7162, NOUN 17, X 6), teplo (NOUN 91, ADV 1), pravda (NOUN 69, PART 2), s (ADP 2504, NOUN 27, X 10, PART 6), růst (NOUN 60, VERB 26), x (NOUN 32, SYM 19)

The 10 most frequent ambiguous types: p (NOUN 163, ADJ 2), s (ADP 1960, NOUN 72, X 10, PART 6), a (CCONJ 6945, ADJ 32, NOUN 17, X 6), září (NOUN 102, VERB 2), j (NOUN 9, ADJ 1), bod (NOUN 87, PROPN 1), stát (NOUN 75, VERB 50), den (NOUN 70, X 1), místo (NOUN 69, ADP 34, ADV 4), x (NOUN 32, SYM 19)

Morphology

The form / lemma ratio of NOUN is 2.025219 (the average of all parts of speech is 1.964432).

The 1st highest number of forms (11) was observed with the lemma “strana”: s, str, stran, strana, stranami, stranou, stranu, strany, stranách, stranám, straně.

The 2nd highest number of forms (10) was observed with the lemma “hodina”: Hodina, h, hod, hodin, hodinami, hodinou, hodinu, hodiny, hodinách, hodině.

The 3rd highest number of forms (10) was observed with the lemma “ministr”: ministr, ministra, ministrem, ministrovi, ministru, ministry, ministrů, ministrům, ministře, ministři.

NOUN occurs with 10 features: Polarity (81649; 98% instances), Gender (79711; 96% instances), Case (78979; 95% instances), Number (78979; 95% instances), Animacy (34831; 42% instances), VerbForm (5750; 7% instances), Abbr (4056; 5% instances), Style (80; 0% instances), Typo (18; 0% instances), Foreign (1; 0% instances)

NOUN occurs with 25 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, Number=Dual, Number=Plur, Number=Sing, Polarity=Neg, Polarity=Pos, Style=Coll, Style=Expr, Style=Slng, Style=Vrnc, Typo=Yes, VerbForm=Vnoun

NOUN occurs with 151 feature combinations. The most frequent feature combination is Case=Gen|Gender=Fem|Number=Sing|Polarity=Pos (6407 tokens). Examples: strany, práce, vlády, společnosti, firmy, republiky, rady, přímky, doby, obrany

Relations

NOUN nodes are attached to their parents using 27 different relations: nmod (27374; 33% instances), obl (14186; 17% instances), nsubj (13009; 16% instances), obj (9158; 11% instances), conj (5717; 7% instances), obl:arg (5178; 6% instances), root (2748; 3% instances), nsubj:pass (1407; 2% instances), appos (1011; 1% instances), dep (909; 1% instances), fixed (493; 1% instances), xcomp (469; 1% instances), orphan (419; 1% instances), advcl (341; 0% instances), ccomp (202; 0% instances), acl:relcl (152; 0% instances), case (139; 0% instances), acl (87; 0% instances), iobj (58; 0% instances), csubj (32; 0% instances), flat (29; 0% instances), parataxis (27; 0% instances), vocative (18; 0% instances), csubj:pass (5; 0% instances), advmod (3; 0% instances), amod (1; 0% instances), discourse (1; 0% instances)

Parents of NOUN nodes belong to 17 different parts of speech: VERB (35702; 43% instances), NOUN (33091; 40% instances), ADJ (6295; 8% instances), (2748; 3% instances), PROPN (1457; 2% instances), ADV (936; 1% instances), NUM (901; 1% instances), DET (866; 1% instances), ADP (496; 1% instances), PRON (208; 0% instances), AUX (155; 0% instances), SYM (128; 0% instances), X (116; 0% instances), PART (62; 0% instances), CCONJ (9; 0% instances), INTJ (2; 0% instances), SCONJ (1; 0% instances)

13660 (16%) NOUN nodes are leaves.

28945 (35%) NOUN nodes have one child.

24380 (29%) NOUN nodes have two children.

16188 (19%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 17.

Children of NOUN nodes are attached using 36 different relations: amod (33138; 24% instances), nmod (29597; 21% instances), case (24959; 18% instances), punct (9867; 7% instances), det (6631; 5% instances), conj (5523; 4% instances), cc (4472; 3% instances), nummod (3702; 3% instances), advmod:emph (3050; 2% instances), acl:relcl (2878; 2% instances), flat (2288; 2% instances), cop (2283; 2% instances), nsubj (1798; 1% instances), mark (1190; 1% instances), nummod:gov (1109; 1% instances), appos (1027; 1% instances), acl (982; 1% instances), dep (953; 1% instances), advmod (589; 0% instances), obl (495; 0% instances), orphan (344; 0% instances), xcomp (344; 0% instances), det:numgov (205; 0% instances), csubj (183; 0% instances), det:nummod (116; 0% instances), advcl (98; 0% instances), aux (82; 0% instances), parataxis (77; 0% instances), obl:arg (27; 0% instances), discourse (15; 0% instances), fixed (13; 0% instances), ccomp (10; 0% instances), obj (8; 0% instances), flat:foreign (6; 0% instances), vocative (2; 0% instances), expl:pv (1; 0% instances)

Children of NOUN nodes belong to 17 different parts of speech: ADJ (33664; 24% instances), NOUN (33091; 24% instances), ADP (24737; 18% instances), PUNCT (9867; 7% instances), DET (7672; 6% instances), PROPN (6875; 5% instances), CCONJ (5196; 4% instances), NUM (5166; 4% instances), VERB (3972; 3% instances), AUX (2419; 2% instances), ADV (2274; 2% instances), SCONJ (1206; 1% instances), PART (979; 1% instances), X (508; 0% instances), PRON (373; 0% instances), SYM (61; 0% instances), INTJ (2; 0% instances)