home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Chinese-GSDSimp: POS Tags: NOUN

There are 8127 NOUN lemmas (36%), 8128 NOUN types (36%) and 34046 NOUN tokens (28%). Out of 16 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: 年、 个、 月、 人、 日、 等、 种、 次、 人口、 名

The 10 most frequent NOUN types: 年、 个、 月、 日、 人、 等、 种、 次、 人口、 名

The 10 most frequent ambiguous lemmas: 年 (NOUN 1558, PART 6), 月 (NOUN 604, PART 1), 人 (NOUN 385, PART 240, VERB 1), 日 (NOUN 382, PROPN 53, PART 7, NUM 2), 等 (NOUN 231, VERB 4, PART 1), 种 (NOUN 187, PART 5, VERB 1), 次 (NOUN 149, VERB 4, PART 3, NUM 1), 名 (NOUN 128, PART 6, VERB 3), 大学 (NOUN 120, PROPN 1), 世界 (NOUN 107, PROPN 1)

The 10 most frequent ambiguous types: 年 (NOUN 1558, PART 6), 月 (NOUN 604, PART 1), 日 (NOUN 382, PROPN 53, PART 7, NUM 2), 人 (NOUN 365, PART 240, VERB 1), 等 (NOUN 231, VERB 3, PART 1), 种 (NOUN 187, PART 5, VERB 1), 次 (NOUN 149, VERB 4, PART 3, NUM 1), 名 (NOUN 128, PART 6, VERB 3), 大学 (NOUN 120, PROPN 1), 世界 (NOUN 107, PROPN 1)

Morphology

The form / lemma ratio of NOUN is 1.000123 (the average of all parts of speech is 1.004660).

The 1st highest number of forms (2) was observed with the lemma “人”: 人, 人们.

The 2nd highest number of forms (1) was observed with the lemma “m”: m.

The 3rd highest number of forms (1) was observed with the lemma “n=1”: n=1.

NOUN occurs with 1 features: Number (20; 0% instances)

NOUN occurs with 1 feature-value pairs: Number=Plur

NOUN occurs with 2 feature combinations. The most frequent feature combination is _ (34026 tokens). Examples: 年、 个、 月、 日、 人、 等、 种、 次、 人口、 名

Relations

NOUN nodes are attached to their parents using 25 different relations: nmod (9826; 29% instances), obj (5677; 17% instances), nsubj (5571; 16% instances), obl (2581; 8% instances), clf (2247; 7% instances), compound (1955; 6% instances), conj (1659; 5% instances), nmod:tmod (1554; 5% instances), acl (571; 2% instances), root (570; 2% instances), appos (514; 2% instances), parataxis (399; 1% instances), advcl (224; 1% instances), ccomp (193; 1% instances), nsubj:pass (157; 0% instances), obl:patient (141; 0% instances), xcomp (79; 0% instances), iobj (48; 0% instances), csubj (35; 0% instances), acl:relcl (15; 0% instances), amod (10; 0% instances), dislocated (10; 0% instances), nummod (6; 0% instances), case (2; 0% instances), orphan (2; 0% instances)

Parents of NOUN nodes belong to 14 different parts of speech: VERB (14738; 43% instances), NOUN (11434; 34% instances), PART (3706; 11% instances), NUM (2234; 7% instances), ADJ (764; 2% instances), (570; 2% instances), PROPN (453; 1% instances), X (62; 0% instances), ADP (36; 0% instances), PRON (18; 0% instances), ADV (14; 0% instances), DET (13; 0% instances), SYM (3; 0% instances), AUX (1; 0% instances)

14624 (43%) NOUN nodes are leaves.

8333 (24%) NOUN nodes have one child.

5398 (16%) NOUN nodes have two children.

5691 (17%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 14.

Children of NOUN nodes are attached using 32 different relations: nmod (12872; 31% instances), nummod (6104; 15% instances), case (5953; 14% instances), punct (4647; 11% instances), amod (1791; 4% instances), conj (1648; 4% instances), acl:relcl (1520; 4% instances), det (1369; 3% instances), cop (1164; 3% instances), nsubj (1111; 3% instances), cc (995; 2% instances), appos (864; 2% instances), acl (497; 1% instances), parataxis (360; 1% instances), clf (211; 1% instances), advmod (207; 0% instances), mark (81; 0% instances), advcl (69; 0% instances), obl (68; 0% instances), csubj (61; 0% instances), nmod:tmod (53; 0% instances), dislocated (36; 0% instances), compound (33; 0% instances), ccomp (20; 0% instances), xcomp (16; 0% instances), mark:rel (15; 0% instances), obj (12; 0% instances), aux (10; 0% instances), discourse (8; 0% instances), orphan (2; 0% instances), mark:adv (1; 0% instances), obl:patient (1; 0% instances)

Children of NOUN nodes belong to 16 different parts of speech: NOUN (11434; 27% instances), NUM (6201; 15% instances), PUNCT (4647; 11% instances), PART (4379; 10% instances), PROPN (3455; 8% instances), ADP (3407; 8% instances), VERB (2108; 5% instances), ADJ (1725; 4% instances), AUX (1175; 3% instances), DET (1144; 3% instances), CCONJ (992; 2% instances), PRON (590; 1% instances), X (247; 1% instances), ADV (206; 0% instances), SCONJ (85; 0% instances), SYM (4; 0% instances)