Treebank Statistics: UD_Czech-PDTC: POS Tags: NOUN
There are 23391 NOUN lemmas (26%), 54040 NOUN types (28%) and 782703 NOUN tokens (23%).
Out of 17 observed tags, the rank of NOUN is: 1 in number of lemmas, 2 in number of types and 1 in number of tokens.
The 10 most frequent NOUN lemmas: společnost, rok, dolar, akcie, léta, firma, cena, trh, doba, den
The 10 most frequent NOUN types: společnosti, společnost, dolarů, roce, roku, let, akcií, trhu, firmy, rok
The 10 most frequent ambiguous lemmas: den (NOUN 3410, X 4), stát (NOUN 3056, VERB 3038), místo (NOUN 2037, ADP 370, ADV 12), program (NOUN 1475, X 1), bod (NOUN 1405, PROPN 3), index (NOUN 1012, X 7), syn (NOUN 997, X 6), růst (NOUN 856, VERB 381), a (CCONJ 67954, NOUN 51, X 33), obrat (NOUN 671, VERB 2)
The 10 most frequent ambiguous types: den (NOUN 1203, X 4), září (NOUN 1032, VERB 4), s (ADP 20009, X 607, NOUN 368, ADJ 7), a (CCONJ 65840, ADJ 181, NOUN 50, X 32), p (NOUN 237, ADJ 4), prodej (NOUN 682, VERB 1), r (NOUN 448, PROPN 1), vedení (NOUN 641, ADJ 4), j (NOUN 22, ADJ 3), místo (NOUN 693, ADP 258, ADV 8)
- den
- září
- s
- a
- p
- prodej
- r
- vedení
- j
- místo
Morphology
The form / lemma ratio of NOUN is 2.310290 (the average of all parts of speech is 2.169184).
The 1st highest number of forms (13) was observed with the lemma “bratr”: bratr, bratra, bratrech, bratrem, bratrovi, bratru, bratry, bratrů, bratrům, bratře, bratři, bratří, bratřích.
The 2nd highest number of forms (12) was observed with the lemma “předek”: předci, předcích, předek, předka, předkem, předkovi, předkové, předku, předky, předků, předkům, předkův.
The 3rd highest number of forms (11) was observed with the lemma “doktor”: doktor, doktora, doktore, doktorech, doktorem, doktorovi, doktoru, doktory, doktorů, doktorům, doktoři.
NOUN occurs with 10 features: Gender (760311; 97% instances), Case (756305; 97% instances), Number (756304; 97% instances), Animacy (337858; 43% instances), VerbForm (49615; 6% instances), Abbr (25235; 3% instances), Style (1275; 0% instances), ExtPos (201; 0% instances), Typo (127; 0% instances), Foreign (1; 0% instances)
NOUN occurs with 26 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, ExtPos=ADP, ExtPos=ADV, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, Number=Dual, Number=Plur, Number=Sing, Style=Coll, Style=Expr, Style=Slng, Style=Vrnc, Style=Vulg, Typo=Yes, VerbForm=Vnoun
NOUN occurs with 267 feature combinations.
The most frequent feature combination is Case=Gen|Gender=Fem|Number=Sing (58927 tokens).
Examples: společnosti, firmy, strany, práce, vlády, doby, školy, skupiny, banky, rady
Relations
NOUN nodes are attached to their parents using 29 different relations: nmod (239392; 31% instances), obl (129446; 17% instances), nsubj (126395; 16% instances), obj (93351; 12% instances), obl:arg (53404; 7% instances), conj (51898; 7% instances), root (28903; 4% instances), appos (16621; 2% instances), nsubj:pass (13117; 2% instances), fixed (5875; 1% instances), dep (5278; 1% instances), advcl:pred (3524; 0% instances), advcl (3259; 0% instances), ccomp (2694; 0% instances), parataxis (2500; 0% instances), orphan (2259; 0% instances), acl:relcl (1533; 0% instances), xcomp (898; 0% instances), iobj (675; 0% instances), acl (636; 0% instances), vocative (279; 0% instances), csubj (234; 0% instances), flat (221; 0% instances), case (215; 0% instances), csubj:pass (79; 0% instances), advmod (12; 0% instances), amod (2; 0% instances), discourse (2; 0% instances), compound (1; 0% instances)
Parents of NOUN nodes belong to 17 different parts of speech: VERB (348897; 45% instances), NOUN (288913; 37% instances), ADJ (56291; 7% instances), (28903; 4% instances), PROPN (14163; 2% instances), ADV (13866; 2% instances), NUM (11591; 1% instances), ADP (5876; 1% instances), DET (3155; 0% instances), X (2933; 0% instances), PRON (2907; 0% instances), AUX (2193; 0% instances), SYM (1317; 0% instances), PART (1254; 0% instances), CCONJ (341; 0% instances), INTJ (101; 0% instances), SCONJ (2; 0% instances)
118786 (15%) NOUN nodes are leaves.
278349 (36%) NOUN nodes have one child.
227447 (29%) NOUN nodes have two children.
158121 (20%) NOUN nodes have three or more children.
The highest child degree of a NOUN node is 23.
Children of NOUN nodes are attached using 40 different relations: amod (288672; 22% instances), nmod (274282; 21% instances), case (248042; 19% instances), punct (105649; 8% instances), det (71189; 5% instances), conj (50862; 4% instances), cc (37450; 3% instances), nummod (35300; 3% instances), nummod:gov (31615; 2% instances), cop (29903; 2% instances), acl:relcl (29719; 2% instances), nsubj (23695; 2% instances), advmod:emph (21805; 2% instances), flat (17520; 1% instances), appos (12892; 1% instances), mark (12500; 1% instances), acl (8384; 1% instances), advmod (6038; 0% instances), obl (4472; 0% instances), parataxis (3581; 0% instances), dep (3070; 0% instances), det:numgov (2836; 0% instances), orphan (1600; 0% instances), advcl (1525; 0% instances), csubj (1500; 0% instances), aux (1491; 0% instances), det:nummod (1225; 0% instances), obj (301; 0% instances), fixed (239; 0% instances), advcl:pred (189; 0% instances), discourse (189; 0% instances), obl:arg (177; 0% instances), expl:pv (86; 0% instances), compound (46; 0% instances), ccomp (45; 0% instances), vocative (36; 0% instances), xcomp (10; 0% instances), expl:pass (6; 0% instances), nsubj:pass (1; 0% instances), reparandum (1; 0% instances)
Children of NOUN nodes belong to 17 different parts of speech: ADJ (295024; 22% instances), NOUN (288913; 22% instances), ADP (247610; 19% instances), PUNCT (105649; 8% instances), DET (88657; 7% instances), NUM (69824; 5% instances), PROPN (57404; 4% instances), VERB (38713; 3% instances), CCONJ (38674; 3% instances), AUX (32147; 2% instances), PART (17057; 1% instances), X (15748; 1% instances), ADV (13340; 1% instances), SCONJ (12278; 1% instances), PRON (5772; 0% instances), SYM (1298; 0% instances), INTJ (35; 0% instances)