home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic-PADT: POS Tags: NUM

There are 993 NUM lemmas (6%), 1083 NUM types (4%) and 7758 NUM tokens (3%). Out of 17 observed tags, the rank of NUM is: 5 in number of lemmas, 5 in number of types and 9 in number of tokens.

The 10 most frequent NUM lemmas: مِليُون، أَلف، 15، 3، ثَلَاثَة، مِليَار، 6، 2، 8، 7

The 10 most frequent NUM types: مليون، 15، 3، 6، 2، 8، 7، مليار، ألف، 4

The 10 most frequent ambiguous lemmas: اِثنَان (NOUN 58, NUM 44), وَاحِد (ADJ 100, NUM 31), أَحَد (NOUN 199, NUM 1)

The 10 most frequent ambiguous types: مليون (NUM 485, X 48), مليار (NUM 153, X 28), ألف (NUM 143, X 2, VERB 1), بليون (NUM 71, X 1), الف (NUM 62, X 4), عشرة (NUM 49, X 2), عشرين (NUM 31, X 1), اثنين (NUM 29, NOUN 1), الاف (NUM 26, X 4), خمس (NUM 24, X 1)

Morphology

The form / lemma ratio of NUM is 1.090634 (the average of all parts of speech is 1.761966).

The 1st highest number of forms (16) was observed with the lemma “أَلف”: آلاف, آلافا, ألف, ألفا, ألفاً, ألفي, ألفين, الآلاف, الألف, الاف, الالاف, الف, الفا, الفى, الفي, الفين.

The 2nd highest number of forms (9) was observed with the lemma “أَربَعَة”: أربع, أربعاً, أربعة, اربع, اربعة, الأربع, الأربعة, الاربع, الاربعة.

The 3rd highest number of forms (9) was observed with the lemma “مِليَار”: المليار, المليارات, مليار, مليارا, مليارات, ملياراً, مليارى, ملياري, مليارين.

NUM occurs with 7 features: NumForm (7758; 100% instances), Case (2208; 28% instances), Definite (2207; 28% instances), Number (1442; 19% instances), Gender (702; 9% instances), NumValue (582; 8% instances), Polarity (1; 0% instances)

NUM occurs with 18 feature-value pairs: Case=Acc, Case=Gen, Case=Nom, Definite=Com, Definite=Cons, Definite=Def, Definite=Ind, Gender=Fem, Gender=Masc, NumForm=Digit, NumForm=Word, NumValue=1, NumValue=2, NumValue=3, Number=Dual, Number=Plur, Number=Sing, Polarity=Neg

NUM occurs with 77 feature combinations. The most frequent feature combination is NumForm=Digit (5521 tokens). Examples: 15، 3، 6، 2، 8، 7، 4، 11، 10، 12

Relations

NUM nodes are attached to their parents using 19 different relations: nummod (3743; 48% instances), conj (1014; 13% instances), obl (672; 9% instances), dep (639; 8% instances), obj (517; 7% instances), obl:arg (502; 6% instances), nsubj (287; 4% instances), root (117; 2% instances), appos (116; 1% instances), nsubj:pass (56; 1% instances), nmod (37; 0% instances), orphan (26; 0% instances), dislocated (10; 0% instances), parataxis (10; 0% instances), iobj (3; 0% instances), xcomp (3; 0% instances), acl (2; 0% instances), advcl (2; 0% instances), ccomp (2; 0% instances)

Parents of NUM nodes belong to 13 different parts of speech: NUM (2536; 33% instances), NOUN (2410; 31% instances), VERB (1728; 22% instances), X (508; 7% instances), ADJ (300; 4% instances), (117; 2% instances), PRON (78; 1% instances), DET (36; 0% instances), CCONJ (18; 0% instances), ADV (13; 0% instances), ADP (8; 0% instances), PART (4; 0% instances), PROPN (2; 0% instances)

1016 (13%) NUM nodes are leaves.

3510 (45%) NUM nodes have one child.

2084 (27%) NUM nodes have two children.

1148 (15%) NUM nodes have three or more children.

The highest child degree of a NUM node is 15.

Children of NUM nodes are attached using 25 different relations: nmod (4041; 35% instances), case (1785; 15% instances), punct (1564; 13% instances), nummod (1502; 13% instances), conj (1019; 9% instances), cc (606; 5% instances), amod (336; 3% instances), obl (187; 2% instances), nsubj (121; 1% instances), appos (111; 1% instances), acl (95; 1% instances), dep (64; 1% instances), parataxis (50; 0% instances), advmod:emph (42; 0% instances), obl:arg (20; 0% instances), mark (17; 0% instances), cop (14; 0% instances), acl:relcl (9; 0% instances), xcomp (9; 0% instances), det (7; 0% instances), advcl (5; 0% instances), advmod (4; 0% instances), orphan (3; 0% instances), aux (2; 0% instances), dislocated (1; 0% instances)

Children of NUM nodes belong to 16 different parts of speech: NOUN (3688; 32% instances), NUM (2536; 22% instances), ADP (1810; 16% instances), PUNCT (1564; 13% instances), CCONJ (517; 4% instances), ADJ (378; 3% instances), X (364; 3% instances), SYM (346; 3% instances), VERB (165; 1% instances), PRON (108; 1% instances), PART (46; 0% instances), ADV (40; 0% instances), DET (19; 0% instances), AUX (16; 0% instances), SCONJ (15; 0% instances), PROPN (2; 0% instances)