Treebank Statistics: UD_Italian-TWITTIRO: Features: Gender
This feature is universal.
It occurs with 2 different values: Fem, Masc.
9154 tokens (31%) have a non-empty value of Gender.
2946 types (48%) occur at least once with a non-empty value of Gender.
2329 lemmas (48%) occur at least once with a non-empty value of Gender.
The feature is used with 10 part-of-speech tags: NOUN (4195; 14% instances), DET (3006; 10% instances), ADJ (981; 3% instances), VERB (505; 2% instances), PRON (443; 1% instances), AUX (19; 0% instances), ADV (2; 0% instances), PROPN (1; 0% instances), SYM (1; 0% instances), X (1; 0% instances).
NOUN
4195 NOUN tokens (94% of all NOUN tokens) have a non-empty value of Gender.
The most frequent other feature values with which NOUN and Gender co-occurred: Number=Sing (2873; 68%).
NOUN tokens may have the following values of Gender:
Fem(1848; 44% of non-emptyGender): scuola, riforma, cosa, casa, crisi, vita, foto, volta, cit., fineMasc(2347; 56% of non-emptyGender): governo, anni, lavoro, anno, italiani, mesi, mondo, tagli, merito, ministroEMPTY(260): RT, docenti, grazie, spread, inglese, insegnanti, rain, tweet, prof, hashtag
| Paradigm ministro | Masc | Fem |
|---|---|---|
| Number=Sing | ministro | ministra |
| Number=Plur | ministri |
Gender seems to be lexical feature of NOUN. 99% lemmas (1681) occur only with one value of Gender.
DET
3006 DET tokens (87% of all DET tokens) have a non-empty value of Gender.
The most frequent other feature values with which DET and Gender co-occurred: PronType=Art (2808; 93%), Definite=Def (2394; 80%), Number=Sing (2287; 76%).
DET tokens may have the following values of Gender:
Fem(1223; 41% of non-emptyGender): la, le, una, un’, questa, sua, mia, tutte, quella, tuaMasc(1783; 59% of non-emptyGender): il, i, un, gli, lo, suo, tutti, mio, questo, unoEMPTY(439): l’, che, l’, ogni, qualche, loro, tutto, quale, sto, tutti
| Paradigm il | Masc | Fem |
|---|---|---|
| Number=Sing | il, lo, er | la, ka |
| Number=Plur | i, gli | le |
ADJ
981 ADJ tokens (79% of all ADJ tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADJ and Gender co-occurred: Number=Sing (748; 76%).
ADJ tokens may have the following values of Gender:
Fem(458; 47% of non-emptyGender): buona, bella, italiana, pubblica, prima, unica, igienica, nuova, prime, nuoveMasc(523; 53% of non-emptyGender): nuovo, primo, buon, italiano, bel, caro, giusto, italiani, unico, belloEMPTY(256): grande, ex, acid, possibile, elementari, miglior, facile, civili, fiscale, forte
| Paradigm buono | Masc | Fem |
|---|---|---|
| Number=Sing | buon, buono | buona |
| Number=Plur | buoni | buone |
VERB
505 VERB tokens (18% of all VERB tokens) have a non-empty value of Gender.
The most frequent other feature values with which VERB and Gender co-occurred: Mood=EMPTY (505; 100%), Person=EMPTY (505; 100%), Tense=Past (504; 100%), VerbForm=Part (504; 100%), Number=Sing (433; 86%).
VERB tokens may have the following values of Gender:
Fem(85; 17% of non-emptyGender): fatta, letta, interrogata, varata, iniziata, ritrovata, scritta, trovata, @user, BastaMasc(420; 83% of non-emptyGender): fatto, detto, morto, messo, avuto, dato, letto, arrivato, capito, lasciatoEMPTY(2341): continua, fare, è, fa, ha, dire, dice, va, far, parla
| Paradigm fare | Masc | Fem |
|---|---|---|
| fatto | fatta |
Gender seems to be lexical feature of VERB. 91% lemmas (243) occur only with one value of Gender.
PRON
443 PRON tokens (31% of all PRON tokens) have a non-empty value of Gender.
The most frequent other feature values with which PRON and Gender co-occurred: Number=Sing (321; 72%), Clitic=EMPTY (255; 58%), Person=EMPTY (247; 56%).
PRON tokens may have the following values of Gender:
Fem(101; 23% of non-emptyGender): la, quella, questa, le, lei, quelle, una, altra, mia, tanteMasc(342; 77% of non-emptyGender): lo, tutti, tutto, li, gli, quello, questo, altro, nessuno, qualcunoEMPTY(973): si, che, ci, mi, chi, c’, ti, ne, noi, io
| Paradigm lo | Masc | Fem |
|---|---|---|
| Number=Sing | lo, l', qual | la |
| Number=Plur | li |
AUX
19 AUX tokens (2% of all AUX tokens) have a non-empty value of Gender.
The most frequent other feature values with which AUX and Gender co-occurred: Mood=EMPTY (19; 100%), Person=EMPTY (19; 100%), Tense=Past (19; 100%), VerbForm=Part (19; 100%), Number=Sing (18; 95%).
AUX tokens may have the following values of Gender:
Fem(8; 42% of non-emptyGender): stataMasc(11; 58% of non-emptyGender): stato, potuto, statiEMPTY(1070): è, ha, sono, era, e’, siamo, hanno, ho, essere, sarà
| Paradigm essere | Masc | Fem |
|---|---|---|
| Number=Sing | stato | stata |
| Number=Plur | stati |
ADV
2 ADV tokens (0% of all ADV tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADV and Gender co-occurred: PronType=Ind (2; 100%).
ADV tokens may have the following values of Gender:
Masc(2; 100% of non-emptyGender): tuttiEMPTY(1408): non, anche, più, ora, solo, poi, ancora, così, bene, già
PROPN
1 PROPN tokens (0% of all PROPN tokens) have a non-empty value of Gender.
PROPN tokens may have the following values of Gender:
Masc(1; 100% of non-emptyGender): FollettoEMPTY(2013): monti, mario, renzi, italia, pd, Berlusconi, Roma, Salvini, Papa, giannini
SYM
1 SYM tokens (0% of all SYM tokens) have a non-empty value of Gender.
SYM tokens may have the following values of Gender:
Masc(1; 100% of non-emptyGender): #cambiaversoEMPTY(2145): @user, #labuonascuola, #monti, @user1, @user2, #renzi, #scuola, @user3, http://t.co/oDPUtx2DvV, #Grillo
X
1 X tokens (1% of all X tokens) have a non-empty value of Gender.
X tokens may have the following values of Gender:
Masc(1; 100% of non-emptyGender): malEMPTY(109): e, i, o, partes, super, zan, #labuonascuola, #tassadopotassa, 10cent, 13.mo
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender:
NOUN –[det]–> DET (2261; 85%),
NOUN –[amod]–> ADJ (637; 79%),
NOUN –[det:poss]–> DET (85; 88%),
VERB –[nsubj:pass]–> NOUN (48; 72%),
NOUN –[compound]–> NOUN (32; 63%),
ADJ –[nsubj]–> NOUN (30; 64%),
PRON –[det]–> DET (19; 66%),
NOUN –[nsubj]–> PRON (18; 53%),
NOUN –[parataxis]–> ADJ (16; 67%),
ADJ –[conj]–> ADJ (15; 65%).