home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Croatian: POS Tags: NOUN

There are 6434 NOUN lemmas (32%), 12759 NOUN types (35%) and 48059 NOUN tokens (24%). Out of 17 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: godina, zemlja, posao, čovjek, predsjednik, vlada, država, milijun, stranka, dan

The 10 most frequent NOUN types: godine, milijuna, godina, eura, zemlje, ljudi, kuna, dana, predsjednik, zemalja

The 10 most frequent ambiguous lemmas: dan (NOUN 195, ADV 2, ADJ 1), kuna (NOUN 155, PROPN 1), banka (NOUN 142, AUX 1), put (NOUN 130, ADV 61, ADP 2), tjedan (NOUN 122, ADJ 1), strana (NOUN 109, ADJ 1), kraj (NOUN 96, ADP 1, ADV 1), pomoć (NOUN 90, ADP 1), posto (ADV 126, NOUN 90), film (NOUN 87, PROPN 1)

The 10 most frequent ambiguous types: kuna (NOUN 146, PROPN 1), dana (NOUN 134, ADJ 1), posto (ADV 126, NOUN 90), vlada (NOUN 77, VERB 6), tjedna (NOUN 79, ADJ 1), strane (NOUN 79, ADJ 10), prava (NOUN 75, ADJ 5), put (NOUN 49, ADV 28), dan (NOUN 39, ADJ 2, ADV 2), niz (NOUN 38, ADP 2)

Morphology

The form / lemma ratio of NOUN is 1.983059 (the average of all parts of speech is 1.827681).

The 1st highest number of forms (9) was observed with the lemma “put”: put, puta, putem, puteve, putevi, putevima, putova, putovima, putu.

The 2nd highest number of forms (8) was observed with the lemma “automobil”: automobil, automobila, automobile, automobili, automobilima, automobilom, automobilu, automoblili.

The 3rd highest number of forms (8) was observed with the lemma “centar”: centar, centara, centra, centre, centri, centrima, centrom, centru.

NOUN occurs with 5 features: Case (47808; 99% instances), Gender (47808; 99% instances), Number (47808; 99% instances), Animacy (3108; 6% instances), Polarity (13; 0% instances)

NOUN occurs with 15 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Gender=Fem, Gender=Masc, Gender=Neut, Number=Plur, Number=Sing, Polarity=Neg

NOUN occurs with 51 feature combinations. The most frequent feature combination is Case=Gen|Gender=Fem|Number=Sing (4518 tokens). Examples: godine, zemlje, vlade, strane, stranke, banke, sigurnosti, države, republike, krize

Relations

NOUN nodes are attached to their parents using 33 different relations: nmod (14850; 31% instances), obl (9948; 21% instances), nsubj (7326; 15% instances), obj (6356; 13% instances), conj (4132; 9% instances), root (964; 2% instances), nsubj:pass (899; 2% instances), appos (846; 2% instances), parataxis (557; 1% instances), compound (408; 1% instances), iobj (346; 1% instances), xcomp (260; 1% instances), ccomp (172; 0% instances), acl (168; 0% instances), amod (153; 0% instances), nummod (118; 0% instances), advcl (114; 0% instances), flat (100; 0% instances), fixed (78; 0% instances), orphan (71; 0% instances), advmod (53; 0% instances), discourse (49; 0% instances), case (21; 0% instances), csubj (18; 0% instances), vocative (14; 0% instances), list (9; 0% instances), mark (8; 0% instances), csubj:pass (5; 0% instances), cc (4; 0% instances), det (4; 0% instances), punct (4; 0% instances), goeswith (3; 0% instances), expl (1; 0% instances)

Parents of NOUN nodes belong to 16 different parts of speech: NOUN (20600; 43% instances), VERB (20232; 42% instances), ADJ (4041; 8% instances), (964; 2% instances), PROPN (764; 2% instances), ADV (580; 1% instances), AUX (429; 1% instances), ADP (115; 0% instances), DET (112; 0% instances), PRON (111; 0% instances), NUM (65; 0% instances), SYM (26; 0% instances), X (12; 0% instances), PART (4; 0% instances), SCONJ (3; 0% instances), CCONJ (1; 0% instances)

7437 (15%) NOUN nodes are leaves.

16579 (34%) NOUN nodes have one child.

14039 (29%) NOUN nodes have two children.

10004 (21%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 13.

Children of NOUN nodes are attached using 36 different relations: nmod (17766; 22% instances), amod (17481; 21% instances), case (14639; 18% instances), punct (6125; 7% instances), conj (4060; 5% instances), det (3344; 4% instances), acl (3202; 4% instances), cc (3129; 4% instances), nummod (2452; 3% instances), appos (2450; 3% instances), cop (1474; 2% instances), nsubj (1339; 2% instances), compound (1337; 2% instances), advmod (1305; 2% instances), discourse (666; 1% instances), parataxis (645; 1% instances), aux (340; 0% instances), mark (317; 0% instances), advcl (127; 0% instances), flat (80; 0% instances), obj (73; 0% instances), orphan (73; 0% instances), csubj (52; 0% instances), xcomp (51; 0% instances), fixed (22; 0% instances), ccomp (17; 0% instances), iobj (17; 0% instances), list (11; 0% instances), flat:foreign (9; 0% instances), goeswith (5; 0% instances), vocative (5; 0% instances), expl (3; 0% instances), expl:pv (3; 0% instances), aux:pass (2; 0% instances), dep (2; 0% instances), nsubj:pass (1; 0% instances)

Children of NOUN nodes belong to 16 different parts of speech: NOUN (20600; 25% instances), ADJ (18837; 23% instances), ADP (14567; 18% instances), PUNCT (6117; 7% instances), PROPN (5174; 6% instances), DET (3859; 5% instances), CCONJ (3296; 4% instances), VERB (2737; 3% instances), NUM (2381; 3% instances), AUX (1885; 2% instances), ADV (1468; 2% instances), SCONJ (742; 1% instances), PRON (485; 1% instances), PART (342; 0% instances), X (85; 0% instances), SYM (49; 0% instances)