home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic: POS Tags: NOUN

There are 4662 NOUN lemmas (27%), 10306 NOUN types (36%) and 92051 NOUN tokens (33%). Out of 16 observed tags, the rank of NOUN is: 2 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: يَوم، رَئِيس، دَولَة، شَرِكَة، وَزِير، مِصر، عَام، دُولَار، حُكُومَة، مِنطَقَة

The 10 most frequent NOUN types: مصر، اليوم، رئيس، دولار، الحكومة، العراق، وزير، كل، الرئيس، غير

The 10 most frequent ambiguous lemmas: مَشرُوع (NOUN 450, ADJ 7), مَسؤُول (NOUN 318, ADJ 28), حَدّ (NOUN 209, VERB 5), أَحَد (NOUN 194, NUM 1), صَادِر (NOUN 174, ADJ 47), طَلَب (NOUN 174, VERB 84), فِلَسطِينِيّ (ADJ 411, NOUN 172), حَقّ (NOUN 161, VERB 15), مُنتَج (NOUN 150, ADJ 5), مَال (NOUN 149, VERB 3)

The 10 most frequent ambiguous types: مصر (NOUN 768, X 18), اليوم (NOUN 532, X 3), دولار (NOUN 481, X 1), وزير (NOUN 431, X 1), كل (NOUN 414, X 1), الرئيس (NOUN 406, X 1), غير (NOUN 348, ADP 1, VERB 1, X 1), عام (NOUN 306, ADJ 33, X 2), عدد (NOUN 290, VERB 2, X 1), العام (NOUN 243, ADJ 158, X 1)

Morphology

The form / lemma ratio of NOUN is 2.210639 (the average of all parts of speech is 1.685281).

The 1st highest number of forms (24) was observed with the lemma “مَسؤُول”: المسؤول, المسؤولون, المسؤولين, المسئول, المسئولان, المسئولون, المسئولين, المســــؤولين, مسؤول, مسؤولا, مسؤولان, مسؤولاً, مسؤولو, مسؤولون, مسؤولى, مسؤولي, مسؤولين, مسئول, مسئولان, مسئولو, مسئولون, مسئولى, مسئولي, مسئولين.

The 2nd highest number of forms (15) was observed with the lemma “أَرض”: أراض, أراضى, أراضي, أراضيا, أرض, أرضاً, اراض, اراضي, ارض, الأراضي, الأرض, الاراضى, الاراضي, الارض, لاراضيه.

The 3rd highest number of forms (14) was observed with the lemma “أَمر”: أمر, أمرا, أمراً, أمرين, أمور, أموراً, أوامر, الأمر, الأمور, الأوامر, الامر, الامور, امر, امور.

NOUN occurs with 4 features: Case (92051; 100% instances), Number (92051; 100% instances), Definite (92032; 100% instances), Polarity (19; 0% instances)

NOUN occurs with 10 feature-value pairs: Case=Acc, Case=Gen, Case=Nom, Definite=Cons, Definite=Def, Definite=Ind, Number=Dual, Number=Plur, Number=Sing, Polarity=Neg

NOUN occurs with 32 feature combinations. The most frequent feature combination is Case=Gen|Definite=Def|Number=Sing (21073 tokens). Examples: العراق، الحكومة، الرئيس، السوق، المنطقة، النفط، التجارة، العام، الخارجية، التعاون

Relations

NOUN nodes are attached to their parents using 29 different relations: nmod (44958; 49% instances), obl (11435; 12% instances), nsubj (11078; 12% instances), obl:arg (7147; 8% instances), conj (6071; 7% instances), obj (6027; 7% instances), cc (1630; 2% instances), root (825; 1% instances), appos (678; 1% instances), nsubj:pass (415; 0% instances), xcomp (382; 0% instances), dep (351; 0% instances), iobj (203; 0% instances), parataxis (122; 0% instances), advmod:emph (120; 0% instances), orphan (118; 0% instances), cop (110; 0% instances), case (100; 0% instances), ccomp (72; 0% instances), advcl (53; 0% instances), acl (41; 0% instances), mark (38; 0% instances), fixed (37; 0% instances), aux (25; 0% instances), csubj (8; 0% instances), advmod (4; 0% instances), amod (1; 0% instances), nummod (1; 0% instances), punct (1; 0% instances)

Parents of NOUN nodes belong to 17 different parts of speech: NOUN (48968; 53% instances), VERB (28445; 31% instances), ADJ (4664; 5% instances), X (3862; 4% instances), NUM (3700; 4% instances), (825; 1% instances), PRON (380; 0% instances), CCONJ (290; 0% instances), PART (277; 0% instances), DET (273; 0% instances), ADV (184; 0% instances), ADP (111; 0% instances), AUX (35; 0% instances), PROPN (19; 0% instances), PUNCT (12; 0% instances), INTJ (5; 0% instances), SYM (1; 0% instances)

17778 (19%) NOUN nodes are leaves.

31095 (34%) NOUN nodes have one child.

27766 (30%) NOUN nodes have two children.

15412 (17%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 29.

Children of NOUN nodes are attached using 28 different relations: nmod (47957; 34% instances), case (32106; 23% instances), amod (22844; 16% instances), cc (8294; 6% instances), conj (6492; 5% instances), punct (5912; 4% instances), acl (4098; 3% instances), obl (2179; 2% instances), det (1989; 1% instances), nummod (1861; 1% instances), obl:arg (1658; 1% instances), nsubj (1043; 1% instances), obj (1033; 1% instances), mark (560; 0% instances), dep (539; 0% instances), appos (502; 0% instances), advmod:emph (405; 0% instances), cop (384; 0% instances), advmod (327; 0% instances), parataxis (290; 0% instances), advcl (183; 0% instances), orphan (144; 0% instances), xcomp (141; 0% instances), ccomp (134; 0% instances), aux (66; 0% instances), csubj (50; 0% instances), nsubj:pass (24; 0% instances), fixed (13; 0% instances)

Children of NOUN nodes belong to 15 different parts of speech: NOUN (48968; 35% instances), ADP (32128; 23% instances), ADJ (23510; 17% instances), X (7603; 5% instances), CCONJ (7063; 5% instances), PUNCT (5913; 4% instances), PRON (5615; 4% instances), VERB (4882; 3% instances), NUM (2358; 2% instances), DET (2296; 2% instances), ADV (310; 0% instances), AUX (290; 0% instances), PART (201; 0% instances), PROPN (80; 0% instances), SYM (11; 0% instances)