home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Japanese-GSD: POS Tags: NOUN

There are 11908 NOUN lemmas (56%), 12402 NOUN types (53%) and 58184 NOUN tokens (30%). Out of 16 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: 年, 事, 月, 日, 御, 人, 者, 後, 為, 物

The 10 most frequent NOUN types: 年, こと, 月, 日, 人, 者, お, 後, ため, もの

The 10 most frequent ambiguous lemmas: 日 (NOUN 514, ADV 1), 後 (NOUN 320, ADV 11), 為 (NOUN 304, SCONJ 63), 物 (NOUN 287, SCONJ 38), 中 (NOUN 248, ADV 1, PROPN 1), 様 (AUX 257, NOUN 172), 所 (NOUN 140, CCONJ 10, SCONJ 10, ADV 1), 前 (NOUN 115, ADV 3), 上 (NOUN 105, SCONJ 16), 共 (NOUN 98, SCONJ 12)

The 10 most frequent ambiguous types: 日 (NOUN 513, ADV 1), 後 (NOUN 308, ADV 1), ため (NOUN 280, SCONJ 61), もの (NOUN 256, SCONJ 38), 中 (NOUN 239, ADV 1, PROPN 1), さん (NOUN 173, NUM 1), よう (AUX 256, NOUN 130), 前 (NOUN 113, ADV 3), 上 (NOUN 102, SCONJ 13), 関係 (NOUN 90, VERB 5)

Morphology

The form / lemma ratio of NOUN is 1.041485 (the average of all parts of speech is 1.115220).

The 1st highest number of forms (5) was observed with the lemma “御”: お, ご, オ, ミ, 御.

The 2nd highest number of forms (5) was observed with the lemma “所”: とこ, ところ, どころ, 処, 所.

The 3rd highest number of forms (5) was observed with the lemma “真”: まこと, まっ, 真, 真っ, 誠.

NOUN occurs with 1 features: Polarity (128; 0% instances)

NOUN occurs with 1 feature-value pairs: Polarity=Neg

NOUN occurs with 2 feature combinations. The most frequent feature combination is _ (58056 tokens). Examples: 年, こと, 月, 日, 人, 者, お, 後, ため, もの

Relations

NOUN nodes are attached to their parents using 14 different relations: compound (19596; 34% instances), obl (11532; 20% instances), nmod (10985; 19% instances), nsubj (6779; 12% instances), obj (4978; 9% instances), root (2330; 4% instances), advcl (866; 1% instances), acl (655; 1% instances), nsubj:outer (400; 1% instances), ccomp (39; 0% instances), case (19; 0% instances), csubj (3; 0% instances), csubj:outer (1; 0% instances), fixed (1; 0% instances)

Parents of NOUN nodes belong to 14 different parts of speech: NOUN (31328; 54% instances), VERB (21201; 36% instances), (2330; 4% instances), ADJ (1917; 3% instances), PROPN (916; 2% instances), NUM (276; 0% instances), ADV (158; 0% instances), PRON (39; 0% instances), AUX (10; 0% instances), SYM (3; 0% instances), INTJ (2; 0% instances), SCONJ (2; 0% instances), ADP (1; 0% instances), DET (1; 0% instances)

19502 (34%) NOUN nodes are leaves.

8542 (15%) NOUN nodes have one child.

14754 (25%) NOUN nodes have two children.

15386 (26%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 25.

Children of NOUN nodes are attached using 24 different relations: case (34830; 35% instances), compound (24735; 25% instances), nmod (11971; 12% instances), punct (8402; 9% instances), acl (6717; 7% instances), nummod (2789; 3% instances), cop (2366; 2% instances), nsubj (1281; 1% instances), mark (973; 1% instances), det (966; 1% instances), aux (742; 1% instances), obl (709; 1% instances), fixed (457; 0% instances), amod (420; 0% instances), advmod (411; 0% instances), obj (346; 0% instances), cc (207; 0% instances), csubj (120; 0% instances), advcl (106; 0% instances), nsubj:outer (61; 0% instances), dep (41; 0% instances), discourse (6; 0% instances), csubj:outer (4; 0% instances), ccomp (1; 0% instances)

Children of NOUN nodes belong to 16 different parts of speech: ADP (35083; 36% instances), NOUN (31328; 32% instances), PUNCT (8402; 9% instances), VERB (5488; 6% instances), NUM (4892; 5% instances), PROPN (4623; 5% instances), AUX (3200; 3% instances), ADJ (1835; 2% instances), DET (966; 1% instances), SYM (810; 1% instances), PART (525; 1% instances), SCONJ (449; 0% instances), ADV (424; 0% instances), PRON (424; 0% instances), CCONJ (207; 0% instances), INTJ (5; 0% instances)