Treebank Statistics: UD_Spanish-AnCora: Features: Gender
This feature is universal.
It occurs with 2 different values: Fem, Masc.
202799 tokens (36%) have a non-empty value of Gender.
17312 types (45%) occur at least once with a non-empty value of Gender.
11621 lemmas (45%) occur at least once with a non-empty value of Gender.
The feature is used with 8 part-of-speech tags: NOUN (88482; 16% instances), DET (78759; 14% instances), ADJ (24227; 4% instances), PRON (5804; 1% instances), VERB (4754; 1% instances), AUX (481; 0% instances), NUM (290; 0% instances), PROPN (2; 0% instances).
NOUN
88482 NOUN tokens (88% of all NOUN tokens) have a non-empty value of Gender.
The most frequent other feature values with which NOUN and Gender co-occurred: Number=Sing (61626; 70%).
NOUN tokens may have the following values of Gender:
Fem(41292; 47% of non-emptyGender): pesetas, personas, parte, vida, situación, vez, forma, elecciones, empresa, decisiónMasc(47190; 53% of non-emptyGender): años, presidente, millones, equipo, partido, país, año, ministro, mundo, grupoEMPTY(12054): parte, frente, portavoz, líder, respecto, vez, pese, policía, año, partir
| Paradigm candidato | Masc | Fem |
|---|---|---|
| Number=Sing | candidato | |
| Number=Plur | candidatos | CANDIDATAS |
Gender seems to be lexical feature of NOUN. 99% lemmas (7756) occur only with one value of Gender.
DET
78759 DET tokens (93% of all DET tokens) have a non-empty value of Gender.
The most frequent other feature values with which DET and Gender co-occurred: PronType=Art (71584; 91%), Number=Sing (62068; 79%), Definite=Def (62012; 79%).
DET tokens may have the following values of Gender:
Fem(32385; 41% of non-emptyGender): la, las, una, esta, esa, todas, estas, otras, toda, otraMasc(46374; 59% of non-emptyGender): el, los, un, este, todo, ese, todos, otros, estos, unosEMPTY(5672): su, sus, cada, mi, cualquier, qué, tal, mis, diferentes, tu
| Paradigm el | Masc | Fem |
|---|---|---|
| Definite=Def|ExtPos=ADV|Number=Sing|PronType=Art | la | |
| Definite=Def|ExtPos=SCONJ|Number=Sing|PronType=Art | el | |
| Definite=Def|Foreign=Yes|Number=Sing|PronType=Art | la | |
| Definite=Def|Foreign=Yes|Number=Plur|PronType=Art | les | les |
| Definite=Def|Number=Sing|PronType=Art | el | la |
| Definite=Def|Number=Plur|PronType=Art | los, els | las |
| Number=Sing|PronType=Dem | el | la |
| Number=Plur|PronType=Dem | los | las |
ADJ
24227 ADJ tokens (67% of all ADJ tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADJ and Gender co-occurred: VerbForm=EMPTY (17726; 73%), Number=Sing (17367; 72%).
ADJ tokens may have the following values of Gender:
Fem(10124; 42% of non-emptyGender): primera, nueva, segunda, política, española, última, nuevas, única, buena, públicaMasc(14103; 58% of non-emptyGender): pasado, primer, nuevo, próximo, últimos, español, segundo, último, único, políticoEMPTY(12200): gran, mayor, mejor, general, posible, ex, grandes, actual, electoral, internacional
| Paradigm primero | Masc | Fem |
|---|---|---|
| Number=Sing | primer, primero | primera |
| Number=Plur | primeros | primeras |
PRON
5804 PRON tokens (23% of all PRON tokens) have a non-empty value of Gender.
The most frequent other feature values with which PRON and Gender co-occurred: Reflex=EMPTY (5803; 100%), Number=Sing (4355; 75%), Person=3 (3444; 59%), PronType=Prs (3344; 58%), PrepCase=EMPTY (3168; 55%).
PRON tokens may have the following values of Gender:
Fem(1191; 21% of non-emptyGender): la, una, ella, las, ellas, otra, ésta, unas, otras, algunasMasc(4613; 79% of non-emptyGender): lo, uno, todo, él, ellos, ello, unos, los, otros, todosEMPTY(19381): que, se, le, me, nos, quien, les, eso, nada, qué
| Paradigm él | Masc | Fem |
|---|---|---|
| Case=Acc,Nom|Number=Sing|PronType=Prs | él, ello | ella |
| Case=Acc,Nom|Number=Plur|PronType=Prs | ellos | ellas |
| Case=Acc|Definite=Def|Number=Sing|PrepCase=Npr|PronType=Prs | lo | |
| Case=Acc|Definite=Ind|Number=Sing|PrepCase=Npr|PronType=Prs | LO | |
| Case=Acc|ExtPos=ADV|Number=Sing|PrepCase=Npr|PronType=Prs | lo | |
| Case=Acc|ExtPos=CCONJ|Number=Sing|PrepCase=Npr|PronType=Prs | lo | |
| Case=Acc|Number=Sing|PrepCase=Npr|PronType=Dem | lo | |
| Case=Acc|Number=Sing|PrepCase=Npr|PronType=Prs | lo | la |
| Case=Acc|Number=Plur|PrepCase=Npr|PronType=Prs | los | las |
| Case=Nom|Number=Sing|PronType=Prs | Ella |
VERB
4754 VERB tokens (10% of all VERB tokens) have a non-empty value of Gender.
The most frequent other feature values with which VERB and Gender co-occurred: Mood=EMPTY (4753; 100%), Person=EMPTY (4753; 100%), Tense=Past (4753; 100%), VerbForm=Part (4753; 100%), Number=Sing (4434; 93%).
VERB tokens may have the following values of Gender:
Fem(333; 7% of non-emptyGender): aprobada, considerada, dada, utilizada, comprada, dadas, incluida, rechazada, recibida, violadaMasc(4421; 93% of non-emptyGender): hecho, tenido, dado, visto, conseguido, pasado, ganado, llegado, perdido, logradoEMPTY(43432): tiene, dijo, hay, hace, hacer, tienen, aseguró, dar, explicó, tener
| Paradigm hacer | Masc | Fem |
|---|---|---|
| Number=Sing | hecho | hecha |
| Number=Plur | hechos |
AUX
481 AUX tokens (4% of all AUX tokens) have a non-empty value of Gender.
The most frequent other feature values with which AUX and Gender co-occurred: Mood=EMPTY (481; 100%), Number=Sing (481; 100%), Person=EMPTY (481; 100%), Tense=Past (480; 100%), VerbForm=Part (480; 100%).
AUX tokens may have the following values of Gender:
Masc(481; 100% of non-emptyGender): sido, podido, estado, debido, serEMPTY(13084): es, ha, han, fue, ser, son, está, puede, había, era
NUM
290 NUM tokens (3% of all NUM tokens) have a non-empty value of Gender.
The most frequent other feature values with which NUM and Gender co-occurred: NumType=Card (290; 100%), NumForm=Word (289; 100%), Number=Plur (176; 61%).
NUM tokens may have the following values of Gender:
Fem(93; 32% of non-emptyGender): ambas, media, una, DECENAS, quinientasMasc(197; 68% of non-emptyGender): ambos, medio, un, doscientos, uno, miles, quinientos, dois, euros, ochentaEMPTY(8884): dos, ciento, tres, cinco, cuatro, seis, 20, 30, siete, 10
| Paradigm ambos | Masc | Fem |
|---|---|---|
| ambos | ambas |
PROPN
2 PROPN tokens (0% of all PROPN tokens) have a non-empty value of Gender.
PROPN tokens may have the following values of Gender:
Fem(2; 100% of non-emptyGender): Cuba, LletresEMPTY(42387): Gobierno, España, Madrid, Barcelona, José, Estado, PP, Juan, Nacional, Estados
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender:
NOUN –[det]–> DET (58013; 86%),
NOUN –[amod]–> ADJ (16915; 63%),
NOUN –[conj]–> NOUN (2526; 54%),
NOUN –[appos]–> NOUN (928; 51%),
ADJ –[det]–> DET (683; 64%),
ADJ –[nsubj]–> NOUN (599; 57%),
ADJ –[conj]–> ADJ (569; 55%),
PRON –[nmod]–> NOUN (440; 74%),
ADJ –[det]–> PRON (158; 62%),
NOUN –[nmod]–> DET (154; 96%).