home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Japanese-GSD: POS Tags: NOUN

There are 11908 NOUN lemmas (56%), 12402 NOUN types (53%) and 58184 NOUN tokens (30%). Out of 16 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: 年, 事, 月, 日, 御, 人, 者, 後, 為, 物

The 10 most frequent NOUN types: 年, こと, 月, 日, 人, 者, お, 後, ため, もの

The 10 most frequent ambiguous lemmas: 日 (NOUN 514, ADV 1), 後 (NOUN 320, ADV 11), 為 (NOUN 304, SCONJ 63), 物 (NOUN 287, SCONJ 38), 中 (NOUN 248, ADV 1, PROPN 1), 様 (AUX 257, NOUN 172), 所 (NOUN 140, CCONJ 10, SCONJ 10, ADV 1), 前 (NOUN 115, ADV 3), 上 (NOUN 105, SCONJ 16), 共 (NOUN 98, SCONJ 12)

The 10 most frequent ambiguous types: 日 (NOUN 513, ADV 1), 後 (NOUN 308, ADV 1), ため (NOUN 280, SCONJ 61), もの (NOUN 256, SCONJ 38), 中 (NOUN 239, ADV 1, PROPN 1), さん (NOUN 173, NUM 1), よう (AUX 256, NOUN 130), 前 (NOUN 113, ADV 3), 上 (NOUN 102, SCONJ 13), 関係 (NOUN 90, VERB 5)

Morphology

The form / lemma ratio of NOUN is 1.041485 (the average of all parts of speech is 1.115220).

The 1st highest number of forms (5) was observed with the lemma “御”: お, ご, オ, ミ, 御.

The 2nd highest number of forms (5) was observed with the lemma “所”: とこ, ところ, どころ, 処, 所.

The 3rd highest number of forms (5) was observed with the lemma “真”: まこと, まっ, 真, 真っ, 誠.

NOUN occurs with 1 features: Polarity (128; 0% instances)

NOUN occurs with 1 feature-value pairs: Polarity=Neg

NOUN occurs with 2 feature combinations. The most frequent feature combination is _ (58056 tokens). Examples: 年, こと, 月, 日, 人, 者, お, 後, ため, もの

Relations

NOUN nodes are attached to their parents using 13 different relations: compound (19487; 33% instances), obl (11534; 20% instances), nmod (11085; 19% instances), nsubj (6948; 12% instances), obj (4978; 9% instances), root (2328; 4% instances), advcl (870; 1% instances), acl (653; 1% instances), dislocated (238; 0% instances), ccomp (39; 0% instances), case (19; 0% instances), csubj (4; 0% instances), fixed (1; 0% instances)

Parents of NOUN nodes belong to 14 different parts of speech: NOUN (31318; 54% instances), VERB (21218; 36% instances), (2328; 4% instances), ADJ (1912; 3% instances), PROPN (916; 2% instances), NUM (276; 0% instances), ADV (158; 0% instances), PRON (36; 0% instances), AUX (10; 0% instances), SCONJ (5; 0% instances), SYM (3; 0% instances), INTJ (2; 0% instances), ADP (1; 0% instances), DET (1; 0% instances)

19423 (33%) NOUN nodes are leaves.

8622 (15%) NOUN nodes have one child.

14774 (25%) NOUN nodes have two children.

15365 (26%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 25.

Children of NOUN nodes are attached using 23 different relations: case (34830; 35% instances), compound (24629; 25% instances), nmod (12073; 12% instances), punct (8398; 9% instances), acl (6723; 7% instances), nummod (2789; 3% instances), cop (2365; 2% instances), nsubj (1299; 1% instances), mark (972; 1% instances), det (966; 1% instances), aux (737; 1% instances), obl (706; 1% instances), fixed (456; 0% instances), amod (420; 0% instances), advmod (408; 0% instances), obj (344; 0% instances), cc (207; 0% instances), csubj (123; 0% instances), advcl (107; 0% instances), dislocated (47; 0% instances), dep (41; 0% instances), discourse (6; 0% instances), ccomp (1; 0% instances)

Children of NOUN nodes belong to 16 different parts of speech: ADP (35083; 36% instances), NOUN (31318; 32% instances), PUNCT (8398; 9% instances), VERB (5499; 6% instances), NUM (4892; 5% instances), PROPN (4621; 5% instances), AUX (3194; 3% instances), ADJ (1834; 2% instances), DET (966; 1% instances), SYM (810; 1% instances), PART (525; 1% instances), SCONJ (448; 0% instances), PRON (426; 0% instances), ADV (421; 0% instances), CCONJ (207; 0% instances), INTJ (5; 0% instances)