home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic-PUD: Features: Gender

This feature is universal. It occurs with 2 different values: Fem, Masc.

12008 tokens (58%) have a non-empty value of Gender. 6146 types (90%) occur at least once with a non-empty value of Gender. 4241 lemmas (89%) occur at least once with a non-empty value of Gender. The feature is used with 7 part-of-speech tags: NOUN (5493; 26% instances), ADJ (1940; 9% instances), VERB (1676; 8% instances), PROPN (1495; 7% instances), PRON (1145; 6% instances), AUX (181; 1% instances), NUM (78; 0% instances).

NOUN

5493 NOUN tokens (98% of all NOUN tokens) have a non-empty value of Gender.

The most frequent other feature values with which NOUN and Gender co-occurred: Definite=Def (4233; 77%), Number=Sing (3967; 72%), Case=Gen (3795; 69%).

NOUN tokens may have the following values of Gender:

Paradigm رَئِيسMascFem
Case=Acc|Definite=Ind|Number=Singرئيساً
Case=Gen|Definite=Def|Number=Singالرئيس, رئيس
Case=Gen|Definite=Def|Number=Dualالرئيسين
Case=Gen|Definite=Ind|Number=Singرئيسٍ, رئيسرئيسةٍ
Case=Nom|Definite=Def|Number=Singرئيس, الرئيس
Case=Nom|Definite=Ind|Number=Plurرؤساء

Gender seems to be lexical feature of NOUN. 98% lemmas (2054) occur only with one value of Gender.

ADJ

1940 ADJ tokens (96% of all ADJ tokens) have a non-empty value of Gender.

The most frequent other feature values with which ADJ and Gender co-occurred: Number=Sing (1865; 96%), Definite=Def (1215; 63%), Case=Gen (1183; 61%).

ADJ tokens may have the following values of Gender:

Paradigm أَوَّلMascFem
Case=Acc|Definite=Def|Number=Singالأولالأولى
Case=Acc|Definite=Ind|Number=Singأولأولى
Case=Gen|Definite=Def|Number=Singالأولالأولى
Case=Gen|Definite=Def|Number=Plurأوائل
Case=Gen|Definite=Ind|Number=Singأولأولى
Case=Nom|Definite=Def|Number=Singالأول, أولأولى, الأولى
Case=Nom|Definite=Ind|Number=Plurأولى

VERB

1676 VERB tokens (96% of all VERB tokens) have a non-empty value of Gender.

The most frequent other feature values with which VERB and Gender co-occurred: Person=3 (1656; 99%), Number=Sing (1565; 93%), Voice=Act (1487; 89%), Tense=Past (862; 51%), Aspect=Imp (849; 51%).

VERB tokens may have the following values of Gender:

Paradigm كَانMascFem
Aspect=Imp|Mood=Ind|Number=Sing|Tense=Futيكون
Aspect=Imp|Mood=Ind|Number=Sing|Tense=Presيكون, يكنتكن
Aspect=Imp|Mood=Jus|Number=Sing|Tense=Pastيكن
Aspect=Imp|Mood=Sub|Number=Sing|Tense=Futيكون
Aspect=Imp|Mood=Sub|Number=Sing|Tense=Presيكون
Aspect=Perf|Number=Sing|Tense=Pastكانكانت
Aspect=Perf|Number=Dual|Tense=Pastكانتا
Aspect=Perf|Number=Plur|Tense=Pastكانوا

PROPN

1495 PROPN tokens (87% of all PROPN tokens) have a non-empty value of Gender.

The most frequent other feature values with which PROPN and Gender co-occurred: Number=Sing (1432; 96%), Definite=EMPTY (1176; 79%), Case=EMPTY (1007; 67%).

PROPN tokens may have the following values of Gender:

Paradigm بِكِينMascFem
بكينبكين

Gender seems to be lexical feature of PROPN. 97% lemmas (891) occur only with one value of Gender.

PRON

1145 PRON tokens (88% of all PRON tokens) have a non-empty value of Gender.

The most frequent other feature values with which PRON and Gender co-occurred: Number=Sing (1012; 88%), Case=Gen (765; 67%), Person=3 (736; 64%).

PRON tokens may have the following values of Gender:

Paradigm هُوَMascFem
Case=Acc|Number=Sing|Person=2ك
Case=Acc|Number=Sing|Person=3هها
Case=Acc|Number=Plur|Person=3هم
Case=Gen|Number=Sing|Person=2ك
Case=Gen|Number=Sing|Person=3هها
Case=Gen|Number=Plur|Person=3همهن, هم
Case=Nom|Number=Sing|Person=3هوهي
Case=Nom|Number=Plur|Person=3هم
Number=Sing|Person=3هو, ههي

AUX

181 AUX tokens (97% of all AUX tokens) have a non-empty value of Gender.

The most frequent other feature values with which AUX and Gender co-occurred: Voice=Act (179; 99%), Person=3 (177; 98%), Number=Sing (168; 93%), Tense=Past (156; 86%), Mood=EMPTY (150; 83%), Aspect=Perf (147; 81%).

AUX tokens may have the following values of Gender:

Paradigm كَانMascFem
Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Fut|Voice=Actيكون
Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|Voice=Actيكونتكون
Aspect=Imp|Mood=Jus|Number=Sing|Person=3|Tense=Past|Voice=Actيكنتكن
Aspect=Imp|Mood=Sub|Number=Sing|Person=3|Tense=Pres|Voice=Actيكونتكون
Aspect=Perf|Number=Sing|Person=2|Tense=Past|Voice=Actكنت
Aspect=Perf|Number=Sing|Person=3|Tense=Past|Voice=Actكانكانت
Aspect=Perf|Number=Plur|Person=3|Tense=Past|Voice=Actكانوا
Case=Gen|Definite=Defكون

NUM

78 NUM tokens (21% of all NUM tokens) have a non-empty value of Gender.

The most frequent other feature values with which NUM and Gender co-occurred: Number=Plur (73; 94%), Case=Gen (44; 56%).

NUM tokens may have the following values of Gender:

Paradigm أربعةMascFem
Case=Accأربعة
Case=Nomأربعةأربعة

Gender seems to be lexical feature of NUM. 97% lemmas (28) occur only with one value of Gender.

Relations with Agreement in Gender

The 10 most frequent relations where parent and child node agree in Gender: NOUN –[amod]–> ADJ (1109; 81%), NOUN –[nmod]–> NOUN (1034; 54%), VERB –[nsubj]–> NOUN (539; 85%), PROPN –[amod]–> ADJ (235; 98%), VERB –[obj]–> NOUN (218; 51%), VERB –[nsubj]–> PRON (177; 98%), NOUN –[conj]–> NOUN (173; 66%), VERB –[nsubj]–> PROPN (173; 90%), NOUN –[acl:relcl]–> VERB (158; 71%), PROPN –[flat]–> PROPN (138; 75%).