Statistics of Gender in UD

home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

Treebank Statistics: UD_Italian-TWITTIRO: Features: `Gender`

This feature is universal. It occurs with 2 different values: Fem, Masc.

9154 tokens (31%) have a non-empty value of Gender. 2946 types (48%) occur at least once with a non-empty value of Gender. 2329 lemmas (48%) occur at least once with a non-empty value of Gender. The feature is used with 10 part-of-speech tags: NOUN (4195; 14% instances), DET (3006; 10% instances), ADJ (981; 3% instances), VERB (505; 2% instances), PRON (443; 1% instances), AUX (19; 0% instances), ADV (2; 0% instances), PROPN (1; 0% instances), SYM (1; 0% instances), X (1; 0% instances).

`NOUN`

4195 NOUN tokens (94% of all NOUN tokens) have a non-empty value of Gender.

The most frequent other feature values with which NOUN and Gender co-occurred: Number=Sing (2873; 68%).

NOUN tokens may have the following values of Gender:

Fem (1848; 44% of non-empty Gender): scuola, riforma, cosa, casa, crisi, vita, foto, volta, cit., fine
Masc (2347; 56% of non-empty Gender): governo, anni, lavoro, anno, italiani, mesi, mondo, tagli, merito, ministro
EMPTY (260): RT, docenti, grazie, spread, inglese, insegnanti, rain, tweet, prof, hashtag

Paradigm ministro	`Masc`	`Fem`
`Number=Sing`	ministro	ministra
`Number=Plur`	ministri

Gender seems to be lexical feature of NOUN. 99% lemmas (1681) occur only with one value of Gender.

`DET`

3006 DET tokens (87% of all DET tokens) have a non-empty value of Gender.

The most frequent other feature values with which DET and Gender co-occurred: PronType=Art (2808; 93%), Definite=Def (2394; 80%), Number=Sing (2287; 76%).

DET tokens may have the following values of Gender:

Fem (1223; 41% of non-empty Gender): la, le, una, un’, questa, sua, mia, tutte, quella, tua
Masc (1783; 59% of non-empty Gender): il, i, un, gli, lo, suo, tutti, mio, questo, uno
EMPTY (439): l’, che, l’, ogni, qualche, loro, tutto, quale, sto, tutti

Paradigm il	`Masc`	`Fem`
`Number=Sing`	il, lo, er	la, ka
`Number=Plur`	i, gli	le

`ADJ`

981 ADJ tokens (79% of all ADJ tokens) have a non-empty value of Gender.

The most frequent other feature values with which ADJ and Gender co-occurred: Number=Sing (748; 76%).

ADJ tokens may have the following values of Gender:

Fem (458; 47% of non-empty Gender): buona, bella, italiana, pubblica, prima, unica, igienica, nuova, prime, nuove
Masc (523; 53% of non-empty Gender): nuovo, primo, buon, italiano, bel, caro, giusto, italiani, unico, bello
EMPTY (256): grande, ex, acid, possibile, elementari, miglior, facile, civili, fiscale, forte

Paradigm buono	`Masc`	`Fem`
`Number=Sing`	buon, buono	buona
`Number=Plur`	buoni	buone

`VERB`

505 VERB tokens (18% of all VERB tokens) have a non-empty value of Gender.

The most frequent other feature values with which VERB and Gender co-occurred: Mood=EMPTY (505; 100%), Person=EMPTY (505; 100%), Tense=Past (504; 100%), VerbForm=Part (504; 100%), Number=Sing (433; 86%).

VERB tokens may have the following values of Gender:

Fem (85; 17% of non-empty Gender): fatta, letta, interrogata, varata, iniziata, ritrovata, scritta, trovata, @user, Basta
Masc (420; 83% of non-empty Gender): fatto, detto, morto, messo, avuto, dato, letto, arrivato, capito, lasciato
EMPTY (2341): continua, fare, è, fa, ha, dire, dice, va, far, parla

Paradigm fare	`Masc`	`Fem`
	fatto	fatta

Gender seems to be lexical feature of VERB. 91% lemmas (243) occur only with one value of Gender.

`PRON`

443 PRON tokens (31% of all PRON tokens) have a non-empty value of Gender.

The most frequent other feature values with which PRON and Gender co-occurred: Number=Sing (321; 72%), Clitic=EMPTY (255; 58%), Person=EMPTY (247; 56%).

PRON tokens may have the following values of Gender:

Fem (101; 23% of non-empty Gender): la, quella, questa, le, lei, quelle, una, altra, mia, tante
Masc (342; 77% of non-empty Gender): lo, tutti, tutto, li, gli, quello, questo, altro, nessuno, qualcuno
EMPTY (973): si, che, ci, mi, chi, c’, ti, ne, noi, io

Paradigm lo	`Masc`	`Fem`
`Number=Sing`	lo, l', qual	la
`Number=Plur`	li

`AUX`

19 AUX tokens (2% of all AUX tokens) have a non-empty value of Gender.

The most frequent other feature values with which AUX and Gender co-occurred: Mood=EMPTY (19; 100%), Person=EMPTY (19; 100%), Tense=Past (19; 100%), VerbForm=Part (19; 100%), Number=Sing (18; 95%).

AUX tokens may have the following values of Gender:

Fem (8; 42% of non-empty Gender): stata
Masc (11; 58% of non-empty Gender): stato, potuto, stati
EMPTY (1070): è, ha, sono, era, e’, siamo, hanno, ho, essere, sarà

Paradigm essere	`Masc`	`Fem`
`Number=Sing`	stato	stata
`Number=Plur`	stati

`ADV`

2 ADV tokens (0% of all ADV tokens) have a non-empty value of Gender.

The most frequent other feature values with which ADV and Gender co-occurred: PronType=Ind (2; 100%).

ADV tokens may have the following values of Gender:

Masc (2; 100% of non-empty Gender): tutti
EMPTY (1408): non, anche, più, ora, solo, poi, ancora, così, bene, già

`PROPN`

1 PROPN tokens (0% of all PROPN tokens) have a non-empty value of Gender.

PROPN tokens may have the following values of Gender:

Masc (1; 100% of non-empty Gender): Folletto
EMPTY (2013): monti, mario, renzi, italia, pd, Berlusconi, Roma, Salvini, Papa, giannini

`SYM`

1 SYM tokens (0% of all SYM tokens) have a non-empty value of Gender.

SYM tokens may have the following values of Gender:

Masc (1; 100% of non-empty Gender): #cambiaverso
EMPTY (2145): @user, #labuonascuola, #monti, @user1, @user2, #renzi, #scuola, @user3, http://t.co/oDPUtx2DvV, #Grillo

`X`

1 X tokens (1% of all X tokens) have a non-empty value of Gender.

X tokens may have the following values of Gender:

Masc (1; 100% of non-empty Gender): mal
EMPTY (109): e, i, o, partes, super, zan, #labuonascuola, #tassadopotassa, 10cent, 13.mo

Relations with Agreement in `Gender`

The 10 most frequent relations where parent and child node agree in Gender: NOUN –[det]–> DET (2261; 85%), NOUN –[amod]–> ADJ (637; 79%), NOUN –[det:poss]–> DET (85; 88%), VERB –[nsubj:pass]–> NOUN (48; 72%), NOUN –[compound]–> NOUN (32; 63%), ADJ –[nsubj]–> NOUN (30; 64%), PRON –[det]–> DET (19; 66%), NOUN –[nsubj]–> PRON (18; 53%), NOUN –[parataxis]–> ADJ (16; 67%), ADJ –[conj]–> ADJ (15; 65%).

Treebank Statistics: UD_Italian-TWITTIRO: Features: Gender

NOUN

DET

ADJ

VERB

PRON

AUX

ADV

PROPN

SYM

X

Relations with Agreement in Gender