home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic-PADT: POS Tags: NOUN

There are 4858 NOUN lemmas (31%), 10582 NOUN types (38%) and 93705 NOUN tokens (33%). Out of 16 observed tags, the rank of NOUN is: 2 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: يَوم، رَئِيس، دَولَة، وَزِير، شَرِكَة، مِصر، عَام، دُولَار، حُكُومَة، مِنطَقَة

The 10 most frequent NOUN types: مصر، اليوم، رئيس، دولار، الحكومة، العراق، وزير، كل، الرئيس، غير

The 10 most frequent ambiguous lemmas: مَشرُوع (NOUN 453, ADJ 7), مَسؤُول (NOUN 318, ADJ 28), حَدّ (NOUN 210, VERB 5), أَحَد (NOUN 199, NUM 1), صَادِر (NOUN 180, ADJ 47), طَلَب (NOUN 174, VERB 86), فِلَسطِينِيّ (ADJ 412, NOUN 172), حَقّ (NOUN 165, VERB 15), هَدَف (NOUN 153, VERB 52), مُنتَج (NOUN 150, ADJ 5)

The 10 most frequent ambiguous types: مصر (NOUN 768, X 18), اليوم (NOUN 534, X 3), دولار (NOUN 481, X 1), وزير (NOUN 433, X 1), كل (NOUN 415, X 1), الرئيس (NOUN 406, X 1), غير (NOUN 350, ADP 1, VERB 1, X 1), عام (NOUN 310, ADJ 33, X 2), عدد (NOUN 290, VERB 2, X 1), العام (NOUN 243, ADJ 158, X 1)

Morphology

The form / lemma ratio of NOUN is 2.178263 (the average of all parts of speech is 1.761701).

The 1st highest number of forms (24) was observed with the lemma “مَسؤُول”: المسؤول, المسؤولون, المسؤولين, المسئول, المسئولان, المسئولون, المسئولين, المســــؤولين, مسؤول, مسؤولا, مسؤولان, مسؤولاً, مسؤولو, مسؤولون, مسؤولى, مسؤولي, مسؤولين, مسئول, مسئولان, مسئولو, مسئولون, مسئولى, مسئولي, مسئولين.

The 2nd highest number of forms (15) was observed with the lemma “أَرض”: أراض, أراضى, أراضي, أراضيا, أرض, أرضاً, اراض, اراضي, ارض, الأراضي, الأرض, الاراضى, الاراضي, الارض, لاراضيه.

The 3rd highest number of forms (14) was observed with the lemma “أَمر”: أمر, أمرا, أمراً, أمرين, أمور, أموراً, أوامر, الأمر, الأمور, الأوامر, الامر, الامور, امر, امور.

NOUN occurs with 5 features: Case (93686; 100% instances), Definite (93680; 100% instances), Number (93664; 100% instances), Gender (27; 0% instances), Polarity (19; 0% instances)

NOUN occurs with 12 feature-value pairs: Case=Acc, Case=Gen, Case=Nom, Definite=Cons, Definite=Def, Definite=Ind, Gender=Fem, Gender=Masc, Number=Dual, Number=Plur, Number=Sing, Polarity=Neg

NOUN occurs with 49 feature combinations. The most frequent feature combination is Case=Gen|Definite=Def|Number=Sing (21074 tokens). Examples: العراق، الحكومة، الرئيس، السوق، المنطقة، النفط، التجارة، العام، الخارجية، التعاون

Relations

NOUN nodes are attached to their parents using 29 different relations: nmod (45408; 48% instances), obl (11600; 12% instances), nsubj (11220; 12% instances), obl:arg (7315; 8% instances), conj (6654; 7% instances), obj (6119; 7% instances), fixed (1068; 1% instances), root (828; 1% instances), appos (683; 1% instances), case (532; 1% instances), nsubj:pass (415; 0% instances), xcomp (393; 0% instances), dep (358; 0% instances), iobj (206; 0% instances), cc (159; 0% instances), parataxis (127; 0% instances), advmod:emph (123; 0% instances), orphan (122; 0% instances), cop (114; 0% instances), ccomp (72; 0% instances), advcl (54; 0% instances), mark (53; 0% instances), acl (41; 0% instances), aux (26; 0% instances), csubj (8; 0% instances), advmod (4; 0% instances), amod (1; 0% instances), nummod (1; 0% instances), punct (1; 0% instances)

Parents of NOUN nodes belong to 17 different parts of speech: NOUN (49621; 53% instances), VERB (28946; 31% instances), ADJ (4756; 5% instances), NUM (3691; 4% instances), X (3235; 3% instances), ADP (1134; 1% instances), (828; 1% instances), PRON (368; 0% instances), CCONJ (321; 0% instances), PART (289; 0% instances), DET (254; 0% instances), ADV (187; 0% instances), AUX (34; 0% instances), PROPN (23; 0% instances), PUNCT (12; 0% instances), INTJ (5; 0% instances), SYM (1; 0% instances)

17630 (19%) NOUN nodes are leaves.

32019 (34%) NOUN nodes have one child.

28960 (31%) NOUN nodes have two children.

15096 (16%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 29.

Children of NOUN nodes are attached using 28 different relations: nmod (49181; 34% instances), case (32554; 23% instances), amod (23033; 16% instances), cc (7492; 5% instances), conj (6587; 5% instances), punct (5978; 4% instances), acl (4155; 3% instances), obl (2290; 2% instances), det (2000; 1% instances), nummod (1883; 1% instances), obl:arg (1702; 1% instances), obj (1076; 1% instances), nsubj (1062; 1% instances), mark (567; 0% instances), dep (548; 0% instances), appos (506; 0% instances), advmod:emph (412; 0% instances), cop (391; 0% instances), fixed (330; 0% instances), advmod (301; 0% instances), parataxis (290; 0% instances), advcl (190; 0% instances), xcomp (148; 0% instances), orphan (146; 0% instances), ccomp (141; 0% instances), aux (67; 0% instances), csubj (57; 0% instances), nsubj:pass (24; 0% instances)

Children of NOUN nodes belong to 15 different parts of speech: NOUN (49621; 35% instances), ADP (32455; 23% instances), ADJ (23731; 17% instances), CCONJ (7733; 5% instances), X (6817; 5% instances), PRON (6203; 4% instances), PUNCT (5979; 4% instances), VERB (4970; 3% instances), NUM (2394; 2% instances), DET (2283; 2% instances), ADV (319; 0% instances), AUX (294; 0% instances), PART (198; 0% instances), PROPN (103; 0% instances), SYM (11; 0% instances)