home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Hebrew-IAHLTwiki: POS Tags: NOUN

There are 3822 NOUN lemmas (35%), 5544 NOUN types (35%) and 34627 NOUN tokens (25%). Out of 16 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: שנה, בית, שיר, יד, משפט, אלבום, חוק, מדינה, ועדה, חלק

The 10 most frequent NOUN types: בית, שנת, משפט, ידי, חוק, אלבום, חלק, שימוש, סוכרת, שנים

The 10 most frequent ambiguous lemmas: שנה (NOUN 686, PROPN 3), בית (NOUN 413, PROPN 39), שיר (NOUN 337, PROPN 26, VERB 2), יד (NOUN 303, PROPN 2, ADP 1), משפט (NOUN 293, PROPN 5), חוק (NOUN 216, PROPN 4), מדינה (NOUN 211, PROPN 28), ועדה (NOUN 190, PROPN 33), חלק (NOUN 183, PROPN 4, VERB 3, ADJ 1), בנק (NOUN 180, PROPN 73)

The 10 most frequent ambiguous types: בית (NOUN 336, PROPN 38), משפט (NOUN 274, PROPN 5), ידי (NOUN 271, ADP 1), חוק (NOUN 186, PROPN 4), חלק (NOUN 164, PROPN 4, ADJ 1, VERB 1), סוכרת (NOUN 150, PROPN 48), שנים (NOUN 148, NUM 1, PROPN 1), שיר (NOUN 144, PROPN 15), דם (NOUN 140, PRON 1), פי (NOUN 140, ADP 5, ADV 3)

Morphology

The form / lemma ratio of NOUN is 1.450549 (the average of all parts of speech is 1.479807).

The 1st highest number of forms (10) was observed with the lemma “איש”: אֲנָשָׁי, איש, אישה, אישי, אישים, אנשי, אנשים, אשה, אשת, נשים.

The 2nd highest number of forms (7) was observed with the lemma “בן”: בְּנֵי, בְנֵי, בן, בנ, בני, בנים, בת.

The 3rd highest number of forms (6) was observed with the lemma “ועדה”: וועדה, וועדות, וועדת, ועדה, ועדות, ועדת.

NOUN occurs with 5 features: Number (34546; 100% instances), Gender (34543; 100% instances), Definite (9397; 27% instances), Abbr (269; 1% instances), Typo (49; 0% instances)

NOUN occurs with 10 feature-value pairs: Abbr=Yes, Definite=Cons, Gender=Fem, Gender=Fem,Masc, Gender=Masc, Number=Dual, Number=Plur, Number=Plur,Sing, Number=Sing, Typo=Yes

NOUN occurs with 36 feature combinations. The most frequent feature combination is Gender=Masc|Number=Sing (10852 tokens). Examples: משפט, חלק, אלבום, שימוש, דם, שיר, חוק, טיפול, דין, אינסולין

Relations

NOUN nodes are attached to their parents using 36 different relations: obl (8351; 24% instances), compound (7157; 21% instances), nmod (4680; 14% instances), nsubj (4090; 12% instances), obj (2591; 7% instances), conj (2407; 7% instances), nmod:poss (1472; 4% instances), root (801; 2% instances), nsubj:pass (749; 2% instances), fixed (708; 2% instances), appos (521; 2% instances), acl:relcl (254; 1% instances), parataxis (192; 1% instances), obl:unmarked (124; 0% instances), xcomp (84; 0% instances), nsubj:outer (75; 0% instances), amod (72; 0% instances), orphan (53; 0% instances), advcl (45; 0% instances), acl (40; 0% instances), ccomp (37; 0% instances), dep (36; 0% instances), nmod:unmarked (20; 0% instances), flat (19; 0% instances), case (10; 0% instances), dislocated (9; 0% instances), csubj (7; 0% instances), compound:affix (5; 0% instances), nummod (4; 0% instances), advmod (3; 0% instances), csubj:pass (3; 0% instances), det (3; 0% instances), list (2; 0% instances), csubj:outer (1; 0% instances), reparandum (1; 0% instances), vocative (1; 0% instances)

Parents of NOUN nodes belong to 15 different parts of speech: NOUN (16307; 47% instances), VERB (14726; 43% instances), ADJ (1117; 3% instances), (801; 2% instances), ADP (676; 2% instances), PROPN (573; 2% instances), NUM (145; 0% instances), PRON (121; 0% instances), ADV (76; 0% instances), SYM (45; 0% instances), X (15; 0% instances), DET (10; 0% instances), AUX (6; 0% instances), SCONJ (6; 0% instances), CCONJ (3; 0% instances)

2938 (8%) NOUN nodes are leaves.

10041 (29%) NOUN nodes have one child.

11716 (34%) NOUN nodes have two children.

9932 (29%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 13.

Children of NOUN nodes are attached using 36 different relations: case (16841; 24% instances), det (9732; 14% instances), compound (8928; 13% instances), amod (6823; 10% instances), nmod (5367; 8% instances), punct (4549; 7% instances), nmod:poss (4320; 6% instances), acl:relcl (2687; 4% instances), conj (2375; 3% instances), cc (1853; 3% instances), appos (1382; 2% instances), advmod (965; 1% instances), nummod (934; 1% instances), nsubj (611; 1% instances), cop (490; 1% instances), mark (343; 0% instances), acl (319; 0% instances), parataxis (209; 0% instances), flat (204; 0% instances), obl (144; 0% instances), dep (133; 0% instances), compound:affix (75; 0% instances), orphan (75; 0% instances), nmod:unmarked (69; 0% instances), xcomp (62; 0% instances), advcl (57; 0% instances), csubj (17; 0% instances), aux (15; 0% instances), fixed (9; 0% instances), reparandum (8; 0% instances), obj (7; 0% instances), nsubj:outer (5; 0% instances), obl:unmarked (4; 0% instances), dislocated (2; 0% instances), ccomp (1; 0% instances), list (1; 0% instances)

Children of NOUN nodes belong to 15 different parts of speech: ADP (16797; 24% instances), NOUN (16307; 23% instances), DET (9066; 13% instances), ADJ (7085; 10% instances), PUNCT (4549; 7% instances), PROPN (3839; 6% instances), PRON (3585; 5% instances), VERB (2864; 4% instances), CCONJ (1855; 3% instances), NUM (1853; 3% instances), ADV (1090; 2% instances), SCONJ (352; 1% instances), AUX (270; 0% instances), X (80; 0% instances), SYM (24; 0% instances)