Treebank Statistics: UD_Spanish-GSD: Features: Gender
This feature is universal.
It occurs with 2 different values: Fem, Masc.
158718 tokens (37%) have a non-empty value of Gender.
20356 types (45%) occur at least once with a non-empty value of Gender.
14665 lemmas (43%) occur at least once with a non-empty value of Gender.
The feature is used with 10 part-of-speech tags: NOUN (70647; 16% instances), DET (56090; 13% instances), ADJ (15916; 4% instances), VERB (6941; 2% instances), PRON (4602; 1% instances), PROPN (3418; 1% instances), X (506; 0% instances), AUX (269; 0% instances), NUM (209; 0% instances), SYM (120; 0% instances).
NOUN
70647 NOUN tokens (89% of all NOUN tokens) have a non-empty value of Gender.
The most frequent other feature values with which NOUN and Gender co-occurred: Number=Sing (50738; 72%).
NOUN tokens may have the following values of Gender:
Fem(33076; 47% of non-emptyGender): parte, población, ciudad, personas, familia, vez, forma, vida, agua, regiónMasc(37571; 53% of non-emptyGender): años, año, municipio, nombre, lugar, equipo, tiempo, estado, grupo, paísEMPTY(8474): habitantes, km, septiembre, enero, julio, junio, mayo, marzo, octubre, agosto
| Paradigm parte | Masc | Fem |
|---|---|---|
| Number=Sing | parte | parte |
| Number=Plur | partes |
Gender seems to be lexical feature of NOUN. 97% lemmas (8762) occur only with one value of Gender.
DET
56090 DET tokens (92% of all DET tokens) have a non-empty value of Gender.
The most frequent other feature values with which DET and Gender co-occurred: PronType=Art (51185; 91%), Number=Sing (44763; 80%), Definite=Def (43530; 78%).
DET tokens may have the following values of Gender:
Fem(23947; 43% of non-emptyGender): la, las, una, esta, otras, toda, estas, esa, todas, otraMasc(32143; 57% of non-emptyGender): el, los, un, este, otros, ese, estos, todo, todos, unosEMPTY(4803): su, sus, cada, cualquier, mi, the, tu, qué, mis, a
| Paradigm el | Masc | Fem |
|---|---|---|
| Definite=Def|Number=Sing | el | la, l' |
| Definite=Def|Number=Sing|Typo=Yes | al, en, del, le | a, al |
| Definite=Def|Number=Plur | los | las |
| Number=Sing|Typo=Yes | al, en | a |
ADJ
15916 ADJ tokens (62% of all ADJ tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADJ and Gender co-occurred: Number=Sing (11453; 72%).
ADJ tokens may have the following values of Gender:
Fem(6917; 43% of non-emptyGender): primera, nueva, segunda, santa, buena, francesa, misma, alta, nuevas, pequeñaMasc(8999; 57% of non-emptyGender): primer, san, mismo, nuevo, junto, segundo, español, buen, propio, primerosEMPTY(9631): gran, mayor, estadounidense, mejor, total, grandes, nacional, principal, importante, diferentes
| Paradigm primero | Masc | Fem |
|---|---|---|
| Number=Sing | primer, primero | primera |
| Number=Plur | primeros | primeras |
VERB
6941 VERB tokens (19% of all VERB tokens) have a non-empty value of Gender.
The most frequent other feature values with which VERB and Gender co-occurred: Person=EMPTY (6940; 100%), Mood=EMPTY (6939; 100%), VerbForm=Part (6934; 100%), Number=Sing (5565; 80%), Tense=EMPTY (3853; 56%).
VERB tokens may have the following values of Gender:
Fem(2024; 29% of non-emptyGender): situada, conocida, ubicada, llamada, dirigida, fundada, publicada, realizada, construida, creadaMasc(4917; 71% of non-emptyGender): ubicado, conocido, debido, llamado, hecho, nacido, dado, compuesto, tenido, puestoEMPTY(29411): tiene, es, encuentra, hay, hacer, hace, tenía, tienen, era, tuvo
| Paradigm tener | Masc | Fem |
|---|---|---|
| Number=Sing | tenido | |
| Number=Plur | tenidos | tenidas |
PRON
4602 PRON tokens (33% of all PRON tokens) have a non-empty value of Gender.
The most frequent other feature values with which PRON and Gender co-occurred: Reflex=EMPTY (4597; 100%), Number=Sing (3496; 76%), PronType=Prs (2881; 63%), Person=3 (2818; 61%), PrepCase=EMPTY (2421; 53%).
PRON tokens may have the following values of Gender:
Fem(1171; 25% of non-emptyGender): la, una, ella, las, ellas, esta, otra, otras, ésta, muchasMasc(3431; 75% of non-emptyGender): lo, uno, los, él, todo, ellos, tanto, ello, este, otrosEMPTY(9442): se, que, le, me, cual, nos, esto, quien, les, te
| Paradigm él | Masc | Fem |
|---|---|---|
| Case=Acc,Nom|Number=Sing | él, ello | ella |
| Case=Acc,Nom|Number=Plur | ellos | ellas |
| Case=Acc|Number=Sing|PrepCase=Npr | lo | la |
| Case=Acc|Number=Plur|PrepCase=Npr | los | las |
| Case=Dat|Number=Sing|PrepCase=Npr|Typo=Yes | la | |
| Case=Nom|Number=Sing | él | |
| Number=Sing|Typo=Yes | el |
PROPN
3418 PROPN tokens (9% of all PROPN tokens) have a non-empty value of Gender.
The most frequent other feature values with which PROPN and Gender co-occurred: Number=Sing (2958; 87%).
PROPN tokens may have the following values of Gender:
Fem(930; 27% of non-emptyGender): guerra, Europea, Ruta, Isla, española, TV, Aérea, batalla, Ciencias, DivisiónMasc(2488; 73% of non-emptyGender): Fernando, Unidos, Estados, Partido, censo, José, of, Club, Diego, PaísEMPTY(33714): España, Estados, Unidos, madrid, Juan, José, María, Argentina, Francia, Barcelona
| Paradigm Isla | Masc | Fem |
|---|---|---|
| _ | Islas | |
| Number=Sing | Isla | |
| Number=Plur | Islas |
Gender seems to be lexical feature of PROPN. 99% lemmas (2126) occur only with one value of Gender.
X
506 X tokens (28% of all X tokens) have a non-empty value of Gender.
The most frequent other feature values with which X and Gender co-occurred: Number=Sing (401; 79%).
X tokens may have the following values of Gender:
Fem(105; 21% of non-emptyGender): ’s, C, B, i, pre, semi, ta, C., high, p.m.Masc(401; 79% of non-emptyGender): mm, msnm, ‘s, etc., n., of, co, cis, parking, andEMPTY(1291): ex, ya, ‘s, C, etc., x, C., and, d, i
| Paradigm 's | Masc | Fem |
|---|---|---|
| _ | 's | 's |
| Number=Sing | 's | 's |
| Number=Sing|Person=3 | 's |
Gender seems to be lexical feature of X. 96% lemmas (360) occur only with one value of Gender.
AUX
269 AUX tokens (3% of all AUX tokens) have a non-empty value of Gender.
The most frequent other feature values with which AUX and Gender co-occurred: Mood=EMPTY (269; 100%), Number=Sing (269; 100%), Person=EMPTY (269; 100%), VerbForm=Part (269; 100%), Tense=Past (268; 100%).
AUX tokens may have the following values of Gender:
Masc(269; 100% of non-emptyGender): sido, estado, podido, debidoEMPTY(10485): es, fue, ha, son, ser, eran, era, han, está, puede
NUM
209 NUM tokens (2% of all NUM tokens) have a non-empty value of Gender.
The most frequent other feature values with which NUM and Gender co-occurred: NumType=Card (209; 100%), Number=Sing (177; 85%), NumForm=Word (169; 81%).
NUM tokens may have the following values of Gender:
Fem(75; 36% of non-emptyGender): una, media, II, pocas, I, IV, XI, ocho, setenta, 2008-09Masc(134; 64% of non-emptyGender): un, uno, ciento, II, medio, cero, millones, V, VIII, XXEMPTY(10849): dos, tres, 2010, 0, cuatro, 3, 1, 2, 10, 4
| Paradigm uno | Masc | Fem |
|---|---|---|
| un, uno | una |
SYM
120 SYM tokens (7% of all SYM tokens) have a non-empty value of Gender.
SYM tokens may have the following values of Gender:
Fem(36; 30% of non-emptyGender): h, $, &, m, €, +, http://redsismica.uprm.edu/spanish/informacion/terr1918.php, http://www.rumbo.es/disney/Masc(84; 70% of non-emptyGender): km, cm, $, &, m, º, mundo.com, www.delnuevo, www.dgt.es, ²EMPTY(1540): %, ², km, $, º, °, €, /, ª, a
| Paradigm $ | Masc | Fem |
|---|---|---|
| Number=Sing | $ | $ |
| Number=Sing|VerbForm=Part | $ | |
| Number=Plur|VerbForm=Part | $ |
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender:
NOUN –[det]–> DET (42692; 84%),
NOUN –[amod]–> ADJ (11106; 58%),
NOUN –[conj]–> NOUN (2943; 54%),
NOUN –[acl]–> VERB (1935; 82%),
VERB –[nsubj:pass]–> NOUN (696; 86%),
PRON –[nmod]–> NOUN (509; 69%),
ADJ –[nsubj]–> NOUN (471; 57%),
ADJ –[conj]–> ADJ (448; 54%),
NOUN –[nsubj]–> NOUN (423; 51%),
NOUN –[det]–> PRON (186; 70%).