home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic-PADT: POS Tags: NUM

There are 993 NUM lemmas (6%), 1083 NUM types (4%) and 7756 NUM tokens (3%). Out of 16 observed tags, the rank of NUM is: 5 in number of lemmas, 5 in number of types and 9 in number of tokens.

The 10 most frequent NUM lemmas: مِليُون، أَلف، 15، 3، ثَلَاثَة، مِليَار، 6، 2، 8، 7

The 10 most frequent NUM types: مليون، 15، 3، 6، 2، 8، 7، مليار، ألف، 4

The 10 most frequent ambiguous lemmas: اِثنَان (NOUN 58, NUM 44), وَاحِد (ADJ 100, NUM 31), أَحَد (NOUN 194, NUM 1)

The 10 most frequent ambiguous types: مليون (NUM 485, X 48), مليار (NUM 153, X 28), ألف (NUM 143, X 2, VERB 1), بليون (NUM 71, X 1), الف (NUM 62, X 4), عشرة (NUM 49, X 2), عشرين (NUM 31, X 1), اثنين (NUM 29, NOUN 1), الاف (NUM 26, X 4), خمس (NUM 24, X 1)

Morphology

The form / lemma ratio of NUM is 1.090634 (the average of all parts of speech is 1.685281).

The 1st highest number of forms (16) was observed with the lemma “أَلف”: آلاف, آلافا, ألف, ألفا, ألفاً, ألفي, ألفين, الآلاف, الألف, الاف, الالاف, الف, الفا, الفى, الفي, الفين.

The 2nd highest number of forms (9) was observed with the lemma “أَربَعَة”: أربع, أربعاً, أربعة, اربع, اربعة, الأربع, الأربعة, الاربع, الاربعة.

The 3rd highest number of forms (9) was observed with the lemma “مِليَار”: المليار, المليارات, مليار, مليارا, مليارات, ملياراً, مليارى, ملياري, مليارين.

NUM occurs with 7 features: NumForm (7756; 100% instances), Case (2206; 28% instances), Definite (2205; 28% instances), Number (1442; 19% instances), Gender (700; 9% instances), NumValue (580; 7% instances), Polarity (1; 0% instances)

NUM occurs with 18 feature-value pairs: Case=Acc, Case=Gen, Case=Nom, Definite=Com, Definite=Cons, Definite=Def, Definite=Ind, Gender=Fem, Gender=Masc, NumForm=Digit, NumForm=Word, NumValue=1, NumValue=2, NumValue=3, Number=Dual, Number=Plur, Number=Sing, Polarity=Neg

NUM occurs with 77 feature combinations. The most frequent feature combination is NumForm=Digit (5521 tokens). Examples: 15، 3، 6، 2، 8، 7، 4، 11، 10، 12

Relations

NUM nodes are attached to their parents using 22 different relations: nummod (3706; 48% instances), conj (1013; 13% instances), obl (672; 9% instances), dep (639; 8% instances), obj (553; 7% instances), obl:arg (492; 6% instances), nsubj (288; 4% instances), appos (126; 2% instances), root (117; 2% instances), nsubj:pass (55; 1% instances), cop (26; 0% instances), orphan (26; 0% instances), parataxis (11; 0% instances), nmod (8; 0% instances), cc (7; 0% instances), iobj (6; 0% instances), acl (2; 0% instances), advcl (2; 0% instances), case (2; 0% instances), ccomp (2; 0% instances), xcomp (2; 0% instances), aux (1; 0% instances)

Parents of NUM nodes belong to 14 different parts of speech: NUM (2535; 33% instances), NOUN (2356; 30% instances), VERB (1716; 22% instances), X (547; 7% instances), ADJ (299; 4% instances), (117; 2% instances), PRON (80; 1% instances), DET (51; 1% instances), CCONJ (20; 0% instances), ADV (13; 0% instances), ADP (8; 0% instances), PUNCT (8; 0% instances), PART (4; 0% instances), PROPN (2; 0% instances)

1017 (13%) NUM nodes are leaves.

3511 (45%) NUM nodes have one child.

2084 (27%) NUM nodes have two children.

1144 (15%) NUM nodes have three or more children.

The highest child degree of a NUM node is 15.

Children of NUM nodes are attached using 24 different relations: nmod (4036; 35% instances), case (1818; 16% instances), punct (1558; 13% instances), nummod (1493; 13% instances), conj (1019; 9% instances), cc (622; 5% instances), amod (336; 3% instances), nsubj (121; 1% instances), appos (112; 1% instances), obl (111; 1% instances), acl (101; 1% instances), dep (64; 1% instances), advmod:emph (52; 0% instances), parataxis (49; 0% instances), cop (25; 0% instances), obl:arg (20; 0% instances), mark (17; 0% instances), advmod (12; 0% instances), xcomp (9; 0% instances), aux (7; 0% instances), det (7; 0% instances), advcl (5; 0% instances), obj (4; 0% instances), orphan (3; 0% instances)

Children of NUM nodes belong to 14 different parts of speech: NOUN (3682; 32% instances), NUM (2535; 22% instances), ADP (1805; 16% instances), PUNCT (1558; 13% instances), CCONJ (529; 5% instances), X (383; 3% instances), ADJ (375; 3% instances), SYM (346; 3% instances), VERB (160; 1% instances), PRON (104; 1% instances), PART (49; 0% instances), ADV (40; 0% instances), DET (20; 0% instances), AUX (15; 0% instances)