Treebank Statistics: UD_Persian-Seraji: POS Tags: NOUN
There are 7261 NOUN lemmas (62%), 9823 NOUN types (61%) and 57579 NOUN tokens (38%).
Out of 15 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.
The 10 most frequent NOUN lemmas: کشور، سال، ایران، مردم، کار، روز، قرار، دست، برنامه، حال
The 10 most frequent NOUN types: ایران، سال، مردم، کشور، روز، کار، قرار، دست، انقلاب، تهران
The 10 most frequent ambiguous lemmas: سال (NOUN 429, ADJ 2), کار (NOUN 362, ADJ 1), روز (NOUN 350, ADJ 2), برنامه (NOUN 232, ADJ 2), حال (NOUN 228, ADV 13), انقلاب (NOUN 202, ADJ 1), مورد (NOUN 197, ADP 128), معاویه (NOUN 180, X 1), گروه (NOUN 169, ADJ 4), امام (NOUN 163, ADJ 1, X 1)
The 10 most frequent ambiguous types: روز (NOUN 297, ADJ 2), معاویه (NOUN 180, X 1), امام (NOUN 161, X 1), مورد (NOUN 150, ADP 128), حال (NOUN 141, ADV 13), سر (NOUN 134, ADP 19), هند (NOUN 118, ADJ 1), میان (NOUN 94, ADP 32), علی (NOUN 87, X 5), همراه (NOUN 80, ADP 23, ADJ 1)
- روز
- معاویه
- امام
- مورد
- حال
- سر
- هند
- میان
- علی
- همراه
Morphology
The form / lemma ratio of NOUN is 1.352844 (the average of all parts of speech is 1.372220).
The 1st highest number of forms (8) was observed with the lemma “جناح”: جناح, جناحها, جناحهای, جناحهایی, جناحِ, جناحی, جناحها, جناحهای.
The 2nd highest number of forms (8) was observed with the lemma “حرف”: حرف, حرفها, حرفهای, حرفهایی, حرفی, حرفهای, حرفهای, حروف.
The 3rd highest number of forms (8) was observed with the lemma “نامه”: نامه, نامها, نامهای, نامهٔ, نامه, نامهای, نامهها, نامههای.
NOUN occurs with 2 features: Number (57575; 100% instances), Case (4; 0% instances)
NOUN occurs with 3 feature-value pairs: Case=Voc, Number=Plur, Number=Sing
NOUN occurs with 3 feature combinations.
The most frequent feature combination is Number=Sing (48927 tokens).
Examples: ایران، سال، مردم، کشور، روز، کار، قرار، دست، انقلاب، تهران
Relations
NOUN nodes are attached to their parents using 27 different relations: nmod:poss (14668; 25% instances), obl (7881; 14% instances), nmod (7508; 13% instances), nsubj (7348; 13% instances), conj (4801; 8% instances), compound:lvc (4766; 8% instances), obj (3422; 6% instances), flat (3320; 6% instances), root (1013; 2% instances), ccomp (618; 1% instances), appos (543; 1% instances), xcomp (310; 1% instances), dep (283; 0% instances), fixed (257; 0% instances), acl:relcl (239; 0% instances), nsubj:pass (144; 0% instances), compound (109; 0% instances), parataxis (104; 0% instances), advcl (92; 0% instances), vocative (65; 0% instances), dislocated (44; 0% instances), nummod (25; 0% instances), case (8; 0% instances), amod (5; 0% instances), compound:prt (3; 0% instances), mark (2; 0% instances), flat:foreign (1; 0% instances)
Parents of NOUN nodes belong to 12 different parts of speech: NOUN (29534; 51% instances), VERB (21703; 38% instances), ADJ (3890; 7% instances), (1013; 2% instances), ADV (452; 1% instances), PRON (406; 1% instances), ADP (262; 0% instances), NUM (246; 0% instances), X (48; 0% instances), DET (22; 0% instances), SCONJ (2; 0% instances), INTJ (1; 0% instances)
13998 (24%) NOUN nodes are leaves.
19234 (33%) NOUN nodes have one child.
16399 (28%) NOUN nodes have two children.
7948 (14%) NOUN nodes have three or more children.
The highest child degree of a NOUN node is 13.
Children of NOUN nodes are attached using 35 different relations: case (17350; 21% instances), nmod:poss (16546; 20% instances), amod (9190; 11% instances), punct (5330; 7% instances), nmod (5256; 6% instances), conj (4887; 6% instances), cc (4063; 5% instances), det (3697; 5% instances), flat (3282; 4% instances), nummod (2413; 3% instances), nsubj (1456; 2% instances), acl:relcl (1256; 2% instances), compound (1221; 2% instances), cop (1090; 1% instances), mark (807; 1% instances), advmod (568; 1% instances), appos (528; 1% instances), ccomp (493; 1% instances), compound:lvc (358; 0% instances), aux (321; 0% instances), dep (267; 0% instances), parataxis (174; 0% instances), advcl (162; 0% instances), fixed (145; 0% instances), obj (80; 0% instances), obl (55; 0% instances), det:predet (52; 0% instances), xcomp (45; 0% instances), aux:pass (31; 0% instances), nsubj:pass (23; 0% instances), vocative (16; 0% instances), dislocated (13; 0% instances), nsubj:nc (9; 0% instances), cc:preconj (8; 0% instances), compound:prt (4; 0% instances)
Children of NOUN nodes belong to 15 different parts of speech: NOUN (29534; 36% instances), ADP (14999; 18% instances), ADJ (9900; 12% instances), PUNCT (5330; 7% instances), CCONJ (4080; 5% instances), DET (3629; 4% instances), PRON (2911; 4% instances), NUM (2707; 3% instances), VERB (2621; 3% instances), PART (2208; 3% instances), AUX (1443; 2% instances), ADV (971; 1% instances), SCONJ (787; 1% instances), X (39; 0% instances), INTJ (37; 0% instances)