home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-Taiga: Features: Gender

This feature is universal. It occurs with 3 different values: Fem, Masc, Neut.

706305 tokens (40%) have a non-empty value of Gender. 117458 types (78%) occur at least once with a non-empty value of Gender. 42993 lemmas (78%) occur at least once with a non-empty value of Gender. The feature is used with 8 part-of-speech tags: NOUN (373634; 21% instances), ADJ (106254; 6% instances), VERB (76144; 4% instances), PROPN (54575; 3% instances), PRON (47130; 3% instances), DET (38369; 2% instances), AUX (6055; 0% instances), NUM (4144; 0% instances).

NOUN

373634 NOUN tokens (99% of all NOUN tokens) have a non-empty value of Gender.

The most frequent other feature values with which NOUN and Gender co-occurred: Animacy=Inan (310893; 83%), Number=Sing (272831; 73%).

NOUN tokens may have the following values of Gender:

Paradigm времяFemNeut
Case=Acc|ExtPos=ADV|Number=Singвремя
Case=Acc|Number=Singвремя
Case=Acc|Number=Sing|Typo=Yesвсемя
Case=Acc|Number=Plurвремена
Case=Dat|Number=Singвремени
Case=Dat|Number=Plurвременам
Case=Gen|Number=Singвремнивремени
Case=Gen|Number=Plurвремен, времён, времени
Case=Ins|Number=Singвременем
Case=Ins|Number=Plurвременами
Case=Loc|Number=Singвремени
Case=Loc|Number=Plurвременах
Case=Nom|Number=Singвремя
Case=Nom|Number=Plurвремена

Gender seems to be lexical feature of NOUN. 99% lemmas (19090) occur only with one value of Gender.

ADJ

106254 ADJ tokens (69% of all ADJ tokens) have a non-empty value of Gender.

The most frequent other feature values with which ADJ and Gender co-occurred: Number=Sing (106245; 100%), Degree=Pos (102004; 96%).

ADJ tokens may have the following values of Gender:

Paradigm русскийMascFemNeut
Animacy=Anim|Case=Accрусского
Animacy=Inan|Case=Accрусский, русская
Case=Accрусскуюрусское
Case=Acc|Typo=YesРускую
Case=Datрусскомурусскойрусскому
Case=Genрусскогорусской, Русскиярусского
Case=Gen|Typo=YesРускыя
Case=Insрусскимрусскойрусским
Case=Locрусскомрусскойрусском
Case=Nomрусскийрусскаярусское

VERB

76144 VERB tokens (36% of all VERB tokens) have a non-empty value of Gender.

The most frequent other feature values with which VERB and Gender co-occurred: Person=EMPTY (76144; 100%), Number=Sing (76143; 100%), Tense=Past (72619; 95%), Mood=Ind (61568; 81%), VerbForm=Fin (61567; 81%), Voice=Act (52449; 69%), Aspect=Perf (50360; 66%).

VERB tokens may have the following values of Gender:

Paradigm мочьMascFemNeut
Case=Acc|Tense=Pres|VerbForm=Partмогущее
Case=Gen|Tense=Pres|VerbForm=Partмогущегомогущего
Case=Nom|Tense=Pres|VerbForm=Partмогущиймогущая
Mood=Ind|Tense=Past|VerbForm=Finмогмогламогло

PROPN

54575 PROPN tokens (81% of all PROPN tokens) have a non-empty value of Gender.

The most frequent other feature values with which PROPN and Gender co-occurred: Abbr=EMPTY (54564; 100%), Number=Sing (54034; 99%), Animacy=Anim (45261; 83%), Case=Nom (29185; 53%).

PROPN tokens may have the following values of Gender:

Paradigm ЧилиMascFemNeut
Case=GenЧилиЧилиЧили
Case=LocЧилиЧили

Gender seems to be lexical feature of PROPN. 98% lemmas (7876) occur only with one value of Gender.

PRON

47130 PRON tokens (53% of all PRON tokens) have a non-empty value of Gender.

The most frequent other feature values with which PRON and Gender co-occurred: Number=Sing (47125; 100%), Case=Nom (23868; 51%), Animacy=EMPTY (23722; 50%), Person=3 (23722; 50%), PronType=Prs (23722; 50%).

PRON tokens may have the following values of Gender:

Paradigm онMascNeut
Case=Accего, него, Эгоего
Case=Datему, нему
Case=Genнего, его
Case=Insним, имним
Case=Locнем, нём, нëм
Case=Nomон
Case=Nom|Typo=Yesона, от

DET

38369 DET tokens (60% of all DET tokens) have a non-empty value of Gender.

The most frequent other feature values with which DET and Gender co-occurred: Number=Sing (38365; 100%), Animacy=EMPTY (34944; 91%), Poss=EMPTY (29930; 78%).

DET tokens may have the following values of Gender:

Paradigm этотMascFemNeut
Animacy=Anim|Case=Accэтого
Animacy=Inan|Case=Acc|ExtPos=DETэтот
Animacy=Inan|Case=Accэтот
Animacy=Inan|Case=Genэтого
Animacy=Inan|Case=Nomэтот
Case=Acc|ExtPos=DETэтуэто
Case=Accэтуэто
Case=Acc|Typo=Yesэто
Case=Dat|ExtPos=DETэтомуэтойэтому
Case=Datэтомуэтойэтому
Case=Gen|ExtPos=DETэтогоэтойэтого
Case=Genэтогоэтойэтого
Case=Gen|Typo=Yesэтово
Case=Ins|ExtPos=DETэтимэтойэтим
Case=Insэтимэтой, этоюэтим
Case=Loc|ExtPos=DETэтомэтой
Case=Locэтомэтойэтом
Case=Nom|ExtPos=DETэтотэтаЭто
Case=Nomэтот, Этоэтаэто
Case=Nom|Typo=Yesэто

AUX

6055 AUX tokens (47% of all AUX tokens) have a non-empty value of Gender.

The most frequent other feature values with which AUX and Gender co-occurred: Number=Sing (6055; 100%), Person=EMPTY (6055; 100%), Tense=Past (6055; 100%), Voice=Act (6055; 100%), Mood=Ind (6051; 100%), VerbForm=Fin (6051; 100%).

AUX tokens may have the following values of Gender:

Paradigm бытьMascFemNeut
Animacy=Anim|Aspect=Imp|Case=Acc|VerbForm=Partбывшего
Aspect=Imp|Case=Nom|VerbForm=Partбывшийбывшая
Aspect=Imp|Mood=Ind|VerbForm=Finбылбылабыло
Mood=Ind|VerbForm=Finбыл, бывшийбыла, бывшаябыло

NUM

4144 NUM tokens (32% of all NUM tokens) have a non-empty value of Gender.

The most frequent other feature values with which NUM and Gender co-occurred: NumForm=Word (4142; 100%), NumType=Card (3703; 89%), Number=EMPTY (2243; 54%).

NUM tokens may have the following values of Gender:

Paradigm одинMascFemNeut
Animacy=Anim|Case=Accодного
Animacy=Inan|Case=Acc|ExtPos=PRONодин
Animacy=Inan|Case=Accодин
Animacy=Inan|Case=Acc|Typo=Yesоден
Case=Accодиноднуодно
Case=Datодномуоднойодному
Case=Genодногооднойодного
Case=Ins|ExtPos=NUMодним
Case=Insоднимоднойодним
Case=Locодномоднойодном
Case=Nom|ExtPos=NUMОдна
Case=Nomодиноднаодно

Relations with Agreement in Gender

The 10 most frequent relations where parent and child node agree in Gender: NOUN –[amod]–> ADJ (74269; 69%), NOUN –[det]–> DET (25471; 57%), VERB –[nsubj]–> PROPN (10379; 80%), VERB –[conj]–> VERB (9871; 64%), ADJ –[conj]–> ADJ (7117; 94%), NOUN –[acl]–> VERB (5868; 51%), NOUN –[appos]–> PROPN (4353; 73%), ADJ –[nsubj]–> NOUN (3831; 62%), PROPN –[amod]–> ADJ (2337; 88%), NOUN –[nummod]–> NUM (2242; 69%).