Statistics of Gender in UD_Italian-KIParlaForest

home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

Treebank Statistics: UD_Italian-KIParlaForest: Features: `Gender`

This feature is universal. It occurs with 2 different values: Fem, Masc.

6263 tokens (34%) have a non-empty value of Gender. 1546 types (53%) occur at least once with a non-empty value of Gender. 1271 lemmas (60%) occur at least once with a non-empty value of Gender. The feature is used with 13 part-of-speech tags: NOUN (2495; 13% instances), DET (1812; 10% instances), ADJ (665; 4% instances), PRON (650; 3% instances), VERB (389; 2% instances), PROPN (111; 1% instances), AUX (35; 0% instances), INTJ (34; 0% instances), NUM (34; 0% instances), ADV (33; 0% instances), ADP (2; 0% instances), CCONJ (2; 0% instances), X (1; 0% instances).

`NOUN`

2495 NOUN tokens (94% of all NOUN tokens) have a non-empty value of Gender.

The most frequent other feature values with which NOUN and Gender co-occurred: Number=Sing (1699; 68%).

NOUN tokens may have the following values of Gender:

Fem (1179; 47% of non-empty Gender): città, realtà, casa, lingua, cosa, parte, università, cose, persone, storia
Masc (1316; 53% of non-empty Gender): tipo, arabo, centro, anni, dialetti, alfabeto, sud, sacco, senso, periodo
EMPTY (173): po’, nord, lingue, cazzo, femminile, okay, chewing, grazie, gum, ics

Paradigm lingua	`Masc`	`Fem`
`Number=Sing`	lingue	lingua
`Number=Plur`		lingue

Gender seems to be lexical feature of NOUN. 97% lemmas (755) occur only with one value of Gender.

`DET`

1812 DET tokens (83% of all DET tokens) have a non-empty value of Gender.

The most frequent other feature values with which DET and Gender co-occurred: PronType=Art (1555; 86%), Number=Sing (1335; 74%), Definite=Def (1167; 64%).

DET tokens may have the following values of Gender:

Fem (853; 47% of non-empty Gender): la, le, una, questa, un’, queste, delle, mia, quella, tutte
Masc (959; 53% of non-empty Gender): il, un, i, gli, questo, lo, questi, dei, uno, tutti
EMPTY (382): l’, che, tutto, loro, tutti, il, qualche, alcuni, la, tutta

Paradigm il	`Masc`	`Fem`
`Definite=Def\|Number=Sing\|PronType=Art`	il, lo, l	la, le
`Definite=Def\|Number=Plur\|PronType=Art`	i, gli, il	le, lo
`Number=Sing\|Person=3\|PronType=Prs`	lo, l'	la
`Number=Sing\|PronType=Art`		la
`Number=Plur\|PronType=Art`		i

`ADJ`

665 ADJ tokens (70% of all ADJ tokens) have a non-empty value of Gender.

The most frequent other feature values with which ADJ and Gender co-occurred: Number=Sing (511; 77%).

ADJ tokens may have the following values of Gender:

Fem (295; 44% of non-empty Gender): araba, mia, piccola, prima, semitica, bella, buona, tua, mezza, altra
Masc (370; 56% of non-empty Gender): esatto, arabo, miei, proto, stesso, strano, antico, bel, islamico, perfetto
EMPTY (282): grande, difficile, stessa, comune, standard, udenti, altra, certo, enorme, facile

Paradigm arabo	`Masc`	`Fem`
`Number=Sing`	arabo	araba
`Number=Plur`	arabi

`PRON`

650 PRON tokens (35% of all PRON tokens) have a non-empty value of Gender.

The most frequent other feature values with which PRON and Gender co-occurred: Number=Sing (510; 78%), Person=EMPTY (383; 59%).

PRON tokens may have the following values of Gender:

Fem (180; 28% of non-empty Gender): questa, le, la, lei, quella, una, altra, queste, alcune, quelle
Masc (470; 72% of non-empty Gender): lo, quello, questo, l’, tutti, li, qualcuno, uno, questi, tutto
EMPTY (1232): c’, io, si, ci, mi, che, me, ti, cui, ne

Paradigm lo	`Masc`	`Fem`
`Definite=Def\|Number=Sing\|PronType=Art`	lo
`Definite=Def\|Number=Plur\|PronType=Prs`		l'
`Number=Sing\|Person=3\|PronType=Prs`	lo, l', qual

`VERB`

389 VERB tokens (16% of all VERB tokens) have a non-empty value of Gender.

The most frequent other feature values with which VERB and Gender co-occurred: Mood=EMPTY (389; 100%), Person=EMPTY (389; 100%), Number=Sing (336; 86%), Tense=Past (334; 86%), VerbForm=Part (334; 86%).

VERB tokens may have the following values of Gender:

Fem (102; 26% of non-empty Gender): fatta, trovata, datata, morte, scritte, andata, andate, basta, chiusa, coperta
Masc (287; 74% of non-empty Gender): detto, fatto, scritto, sentito, visto, imparato, parlato, trovato, usato, vissuto
EMPTY (1995): è, so, sono, abbiamo, fa, fare, era, ha, dire, va

Paradigm essere	`Masc`	`Fem`
`Number=Sing`	stato	stata
`Number=Plur`	stati

`PROPN`

111 PROPN tokens (26% of all PROPN tokens) have a non-empty value of Gender.

The most frequent other feature values with which PROPN and Gender co-occurred: Number=Sing (81; 73%).

PROPN tokens may have the following values of Gender:

Fem (59; 53% of non-empty Gender): arabia, siria, giordania, saudita, saba, turchia, arancioni, marina, palestina, palmira
Masc (52; 47% of non-empty Gender): rossi, oman, kitab, nabatei, qays, erodoto, sinai, arab, egitto, fermo
EMPTY (313): [TOWN_NAME], ancona, bologna, pesaro, cristo, [PLACE_NAME], fermo, gialli, imola, marche

Gender seems to be lexical feature of PROPN. 100% lemmas (56) occur only with one value of Gender.

`AUX`

35 AUX tokens (3% of all AUX tokens) have a non-empty value of Gender.

The most frequent other feature values with which AUX and Gender co-occurred: Mood=EMPTY (35; 100%), Person=EMPTY (35; 100%), Tense=Past (27; 77%), VerbForm=Part (27; 77%), Number=Sing (26; 74%).

AUX tokens may have the following values of Gender:

Fem (18; 51% of non-empty Gender): stata, state, son, esser
Masc (17; 49% of non-empty Gender): stato, son, stati, abbiamo, avevo, ero, stavo
EMPTY (1009): è, sono, ho, ha, era, devi, hanno, possiamo, abbiamo, son

Paradigm essere	`Masc`	`Fem`
`_`	son	son
`Number=Sing`	ero
`Number=Sing\|Tense=Past\|VerbForm=Part`	stato	stata
`Number=Plur`		esser
`Number=Plur\|Tense=Past\|VerbForm=Part`	stati	state

`INTJ`

34 INTJ tokens (4% of all INTJ tokens) have a non-empty value of Gender.

INTJ tokens may have the following values of Gender:

Masc (34; 100% of non-empty Gender): mh, eh
EMPTY (757): eh, mh, okay, ah, no, sì, vabbè, mhmh, beh, ehm

`NUM`

34 NUM tokens (20% of all NUM tokens) have a non-empty value of Gender.

The most frequent other feature values with which NUM and Gender co-occurred: Number=Sing (25; 74%), NumType=Ord (19; 56%).

NUM tokens may have the following values of Gender:

Fem (10; 29% of non-empty Gender): prima, seconda, sedicimila, terza
Masc (24; 71% of non-empty Gender): primi, primo, seicento, trecentoventotto, duecento, duemiladiciotto, milleseicento, ottocento, secondo, sedici
EMPTY (138): due, quattro, tre, cinque, quattordici, sette, dieci, mille, undici, cinquanta

Paradigm primo	`Masc`	`Fem`
`_`	primi
`Number=Sing\|NumType=Ord`	primo	prima
`Number=Plur\|NumType=Ord`	primi

`ADV`

33 ADV tokens (1% of all ADV tokens) have a non-empty value of Gender.

The most frequent other feature values with which ADV and Gender co-occurred: PronType=EMPTY (29; 88%).

ADV tokens may have the following values of Gender:

Fem (12; 36% of non-empty Gender): cosa, etcetera, lì, più, invece, molte, quali, tutta, tutte, vicina
Masc (21; 64% of non-empty Gender): quanto, giusto, lì, meno, almeno, bene, esatto, fino, lontano, manco
EMPTY (2179): non, sì, no, anche, più, poi, molto, così, bene, adesso

Paradigm lì	`Masc`	`Fem`
	lì	lì

`ADP`

2 ADP tokens (0% of all ADP tokens) have a non-empty value of Gender.

ADP tokens may have the following values of Gender:

Fem (1; 50% of non-empty Gender): a
Masc (1; 50% of non-empty Gender): in
EMPTY (1901): di, in, a, per, da, con, su, come, secondo, tra

`CCONJ`

2 CCONJ tokens (0% of all CCONJ tokens) have a non-empty value of Gender.

CCONJ tokens may have the following values of Gender:

Fem (2; 100% of non-empty Gender): oppure
EMPTY (1000): e, cioè, ma, quindi, però, o, comunque, sia, che, infatti

`X`

1 X tokens (0% of all X tokens) have a non-empty value of Gender.

X tokens may have the following values of Gender:

Fem (1; 100% of non-empty Gender): s~
EMPTY (353): x, s~, no~, a~, day, di~, may, n~, p~, ti~

Relations with Agreement in `Gender`

The 10 most frequent relations where parent and child node agree in Gender: NOUN –[det]–> DET (1388; 81%), NOUN –[amod]–> ADJ (341; 69%), NOUN –[conj]–> NOUN (46; 56%), PROPN –[det]–> DET (34; 51%), ADJ –[det]–> DET (24; 59%), NOUN –[det:poss]–> DET (23; 66%), ADJ –[nsubj]–> NOUN (22; 76%), DET –[reparandum]–> DET (16; 57%), NOUN –[parataxis]–> NOUN (15; 63%), INTJ –[discourse]–> INTJ (13; 100%).

Treebank Statistics: UD_Italian-KIParlaForest: Features: Gender

NOUN

DET

ADJ

PRON

VERB

PROPN

AUX

INTJ

NUM

ADV

ADP

CCONJ

X

Relations with Agreement in Gender