home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Pashto-Sikaram: POS Tags: NOUN

There are 485 NOUN lemmas (44%), 607 NOUN types (42%) and 1156 NOUN tokens (21%). Out of 16 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: ژبه, کتاب, ژباړه, _, خبره, خلک, خوا, ارزښت, برخه, ستونزه

The 10 most frequent NOUN types: ژبه, ژبې, خوا, ژباړې, کار, ژباړه, ژبو, کتابونه, توګه, خبرې

The 10 most frequent ambiguous lemmas: کتاب (NOUN 25, PROPN 3), _ (NOUN 21, ADJ 14, VERB 9, X 8, PROPN 3, ADP 2, NUM 2, PART 1, PRON 1, SYM 1), باران (NOUN 5, X 1), اصل (NOUN 2, ADJ 1), آر (ADJ 1, NOUN 1), اوازه (NOUN 1, X 1), دور (NOUN 1, X 1), قدوري (NOUN 1, PROPN 1), قسمت (NOUN 1, X 1), معادل (ADJ 1, NOUN 1)

The 10 most frequent ambiguous types: باران (NOUN 5, X 1), کتاب (NOUN 4, PROPN 3), اوازه (NOUN 1, X 1), تورو (ADJ 1, NOUN 1), جګړه (ADJ 1, NOUN 1), دور (NOUN 1, X 1), شي (AUX 33, VERB 14, NOUN 1), قدوري (NOUN 1, PROPN 1), معادل (ADJ 1, NOUN 1), نور (ADJ 7, NOUN 1)

Morphology

The form / lemma ratio of NOUN is 1.251546 (the average of all parts of speech is 1.318390).

The 1st highest number of forms (19) was observed with the lemma “_”: ارتباط, تاړاک, تدريس, توجه, حالت, حماسه, خبريال, دښمن, سړی, طب, عمل, قلم, مابین, مانا, مثال, ملا, ځنډ, کس, کورسونو.

The 2nd highest number of forms (4) was observed with the lemma “دود”: دود, دوده, دودونه, دودونو.

The 3rd highest number of forms (4) was observed with the lemma “لاس”: لاس, لاسه, لاسونه, لاسونو.

NOUN occurs with 7 features: Case (1156; 100% instances), Gender (1156; 100% instances), Number (1156; 100% instances), VerbForm (21; 2% instances), Typo (7; 1% instances), ExtPos (2; 0% instances), Variant (1; 0% instances)

NOUN occurs with 14 feature-value pairs: Case=Abl, Case=Acc, Case=Loc, Case=Nom, ExtPos=ADV, Gender=Fem, Gender=Masc, Number=Coll, Number=Plur, Number=Ptan, Number=Sing, Typo=Yes, Variant=Short, VerbForm=Vnoun

NOUN occurs with 34 feature combinations. The most frequent feature combination is Case=Nom|Gender=Masc|Number=Sing (237 tokens). Examples: ډول, دود, چاپ, کار, اثر, ارزښت, تدريس, توپیر, خدای, لوست

Relations

NOUN nodes are attached to their parents using 23 different relations: obl (252; 22% instances), nmod (213; 18% instances), nsubj (212; 18% instances), obj (190; 16% instances), conj (129; 11% instances), root (39; 3% instances), nsubj:pass (34; 3% instances), compound:lvc (24; 2% instances), appos (13; 1% instances), compound (12; 1% instances), xcomp (9; 1% instances), acl:relcl (6; 1% instances), acl (3; 0% instances), flat:name (3; 0% instances), obl:arg (3; 0% instances), orphan:nsubjobj (3; 0% instances), advcl (2; 0% instances), ccomp (2; 0% instances), fixed (2; 0% instances), parataxis (2; 0% instances), dep (1; 0% instances), discourse (1; 0% instances), dislocated (1; 0% instances)

Parents of NOUN nodes belong to 9 different parts of speech: VERB (653; 56% instances), NOUN (373; 32% instances), ADJ (68; 6% instances), (39; 3% instances), PRON (10; 1% instances), PROPN (9; 1% instances), X (2; 0% instances), ADV (1; 0% instances), PART (1; 0% instances)

168 (15%) NOUN nodes are leaves.

372 (32%) NOUN nodes have one child.

322 (28%) NOUN nodes have two children.

294 (25%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 10.

Children of NOUN nodes are attached using 31 different relations: case (581; 27% instances), nmod (338; 16% instances), amod (331; 15% instances), det (188; 9% instances), punct (143; 7% instances), conj (130; 6% instances), cc (102; 5% instances), cop (54; 2% instances), nummod (53; 2% instances), acl:relcl (48; 2% instances), nsubj (45; 2% instances), advmod (37; 2% instances), compound (19; 1% instances), mark (18; 1% instances), obl (15; 1% instances), acl (14; 1% instances), appos (14; 1% instances), advcl (7; 0% instances), parataxis (6; 0% instances), fixed (4; 0% instances), flat:name (4; 0% instances), goeswith (3; 0% instances), csubj (2; 0% instances), aux:fut (1; 0% instances), compound:lvc (1; 0% instances), dep (1; 0% instances), discourse (1; 0% instances), dislocated (1; 0% instances), orphan:nsubjobj (1; 0% instances), orphan:objobl (1; 0% instances), xcomp (1; 0% instances)

Children of NOUN nodes belong to 15 different parts of speech: ADP (583; 27% instances), NOUN (373; 17% instances), ADJ (364; 17% instances), DET (188; 9% instances), PUNCT (143; 7% instances), CCONJ (102; 5% instances), PROPN (87; 4% instances), PRON (80; 4% instances), VERB (80; 4% instances), AUX (55; 3% instances), NUM (53; 2% instances), ADV (26; 1% instances), SCONJ (18; 1% instances), PART (9; 0% instances), X (3; 0% instances)