Gender
: gender
Gender
is usually a lexical feature of nouns and inflectional feature
of other parts of speech (pronouns,
adjectives, determiners, numerals,
verbs) that mark agreement with
nouns.
Masc
: masculine gender
Nouns denoting male persons are masculine. Other nouns may be also grammatically masculine, without any relation to sex.
Examples
- castelo “castle”
Fem
: feminine gender
Nouns denoting female persons are feminine. Other nouns may be also grammatically feminine, without any relation to sex.
Examples
- casa “house”
Unsp
: unspecified
Unsp
is used to tag words that can be masculine or feminine when the context is not enough to make clear its gender.
Examples
- você “you”
Treebank Statistics (UD_Portuguese)
This feature is universal.
It occurs with 2 different values: Fem
, Masc
.
104749 tokens (46%) have a non-empty value of Gender
.
17718 types (69%) occur at least once with a non-empty value of Gender
.
13097 lemmas (73%) occur at least once with a non-empty value of Gender
.
The feature is used with 8 part-of-speech tags: NOUN (38942; 17% instances), DET (34335; 15% instances), ADJ (10649; 5% instances), PROPN (6096; 3% instances), PRON (5977; 3% instances), VERB (4225; 2% instances), NUM (4109; 2% instances), SYM (416; 0% instances).
NOUN
38942 NOUN tokens (93% of all NOUN
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which NOUN
and Gender
co-occurred: Number=Sing (27451; 70%).
NOUN
tokens may have the following values of Gender
:
Fem
(17989; 46% of non-emptyGender
): pessoas, parte, semana, empresa, empresas, cidade, forma, vida, casa, noiteMasc
(20953; 54% of non-emptyGender
): anos, milhões, ano, dia, país, presidente, contos, grupo, governo, diasEMPTY
(3110): cento, vez, partir, entanto, fora, presidente, relação, exemplo, lado, é
Paradigm milhão | Masc | Fem |
---|---|---|
Number=Sing | milhão | |
Number=Plur | milhões | milhões |
Number=Plur|Typo=Yes | mi |
Gender
seems to be lexical feature of NOUN
. 98% lemmas (6317) occur only with one value of Gender
.
DET
34335 DET tokens (96% of all DET
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which DET
and Gender
co-occurred: PronType=Art (30272; 88%), Number=Sing (27222; 79%), Definite=Def (27214; 79%).
DET
tokens may have the following values of Gender
:
Fem
(15108; 44% of non-emptyGender
): a, as, uma, sua, esta, suas, essa, outras, algumas, mesmaMasc
(19227; 56% of non-emptyGender
): o, os, um, seu, este, seus, esse, outros, alguns, todoEMPTY
(1325): a, as, todos, todas, toda, dezenas, mais, cada, qualquer, L’
Paradigm o | Masc | Fem |
---|---|---|
Definite=Def | o(s) | |
Definite=Def|Number=Sing | o | a |
Definite=Def|Number=Sing|PronType=Art | o, Os, a | a, as |
Definite=Def|Number=Sing|PronType=Art|Typo=Yes | os | a |
Definite=Def|Number=Plur | os | as |
Definite=Def|Number=Plur|PronType=Art | os, o | as |
Definite=Ind|Number=Sing|PronType=Art | o | |
Number=Sing | o | |
Number=Sing|NumType=Card|PronType=Ind,Neg,Tot | a | |
Number=Sing|PronType=Art | o | a |
Number=Sing|PronType=Dem | o | a |
Number=Plur | os | |
Number=Plur|PronType=Art | os | as |
Number=Plur|PronType=Dem | os | as |
ADJ
10649 ADJ tokens (97% of all ADJ
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which ADJ
and Gender
co-occurred: Number=Sing (7554; 71%).
ADJ
tokens may have the following values of Gender
:
Fem
(4784; 45% of non-emptyGender
): primeira, nova, maior, grande, última, segunda, boa, política, novas, próximaMasc
(5865; 55% of non-emptyGender
): primeiro, novo, grande, último, segundo, últimos, maior, novos, bom, próximoEMPTY
(345): Nacional, mesmo, devido, outro, II, próximo, I, muitas, jovens, junto
Paradigm grande | Masc | Fem |
---|---|---|
Degree=Cmp|Number=Sing | maior | maior |
Degree=Cmp|Number=Plur | maiores | maiores |
Degree=Sup|Number=Sing | máximo | máxima |
Degree=Sup|Number=Plur | máximos | |
Number=Sing | grande, maior, máximo | grande, maior, máxima |
Number=Plur | grandes, máximos | grandes, maiores |
PROPN
6096 PROPN tokens (33% of all PROPN
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which PROPN
and Gender
co-occurred: Number=Sing (5871; 96%).
PROPN
tokens may have the following values of Gender
:
Fem
(2161; 35% of non-emptyGender
): Lisboa, Folha, Alemanha, França, Espanha, Europa, Rússia, Itália, Internet, BósniaMasc
(3935; 65% of non-emptyGender
): Portugal, Brasil, Governo, EUA, PÚBLICO, Rio, Porto, FHC, Benfica, PTEMPTY
(12257): Paulo, São, José, João, Carlos, Estados, Unidos, Fernando, Silva, Porto
Paradigm EUA | Masc | Fem |
---|---|---|
Hyph=Yes | EUA | |
EUA | EUA |
Gender
seems to be lexical feature of PROPN
. 97% lemmas (2691) occur only with one value of Gender
.
PRON
5977 PRON tokens (89% of all PRON
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which PRON
and Gender
co-occurred: Reflex=EMPTY (5313; 89%), Number=Sing (4303; 72%), Case=EMPTY (3839; 64%), Person=EMPTY (3817; 64%).
PRON
tokens may have the following values of Gender
:
Fem
(1869; 31% of non-emptyGender
): que, se, -se, ela, a, elas, as, esta, eu, qualMasc
(4108; 69% of non-emptyGender
): que, se, -se, ele, o, isso, tudo, eles, os, lheEMPTY
(741): se, -se, quem, eu, você, nos, nós, me, -me, si
Paradigm que | Masc | Fem |
---|---|---|
Number=Sing|PronType=Ind | que | que |
Number=Sing|PronType=Int | que | |
Number=Sing|PronType=Rel | que | que |
Number=Sing|PronType=Rel|Typo=Yes | que | |
Number=Plur|PronType=Ind | que | |
Number=Plur|PronType=Int | que | que |
Number=Plur|PronType=Rel | que | que |
PronType=Rel | que |
VERB
4225 VERB tokens (16% of all VERB
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which VERB
and Gender
co-occurred: Tense=EMPTY (4225; 100%), Person=EMPTY (4224; 100%), Mood=EMPTY (4224; 100%), VerbForm=Part (4223; 100%), Number=Sing (2817; 67%).
VERB
tokens may have the following values of Gender
:
Fem
(1682; 40% of non-emptyGender
): passada, feita, feitas, prevista, considerada, criada, realizada, aberta, apresentada, dadaMasc
(2543; 60% of non-emptyGender
): passado, feito, eleito, aberto, considerado, previsto, entregue, ligados, realizado, acusadoEMPTY
(21658): é, foi, ser, são, tem, está, há, ter, disse, foram
Paradigm ter | Masc | Fem |
---|---|---|
Number=Sing | tido | tida |
Number=Plur | tidas |
NUM
4109 NUM tokens (96% of all NUM
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which NUM
and Gender
co-occurred: NumType=Card (4109; 100%), Number=Plur (3173; 77%).
NUM
tokens may have the following values of Gender
:
Fem
(689; 17% of non-emptyGender
): uma, duas, três, quatro, mil, seis, cinco, 200, dez, seteMasc
(3420; 83% of non-emptyGender
): um, dois, mil, três, quatro, cinco, 15, 30, 20, dezEMPTY
(180): 1, 2, mil, sete, 3, três, 24, 25, 94, 011
Paradigm um | Masc | Fem |
---|---|---|
Number=Sing | um | uma |
Number=Plur | um |
SYM
416 SYM tokens (92% of all SYM
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which SYM
and Gender
co-occurred: Number=Plur (408; 98%).
SYM
tokens may have the following values of Gender
:
Masc
(416; 100% of non-emptyGender
): %, US$, R$, CR$, $%, U$EMPTY
(34): /
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender
:
NOUN –[det]–> DET (26415; 94%),
NOUN –[amod]–> ADJ (8131; 98%),
NOUN –[nummod]–> NUM (2432; 89%),
NOUN –[conj]–> NOUN (1265; 58%),
VERB –[nsubjpass]–> NOUN (633; 94%),
ADJ –[det]–> DET (552; 92%),
ADJ –[conj]–> ADJ (362; 96%),
ADJ –[nsubj]–> NOUN (354; 95%),
NUM –[nmod]–> NOUN (318; 87%),
SYM –[nummod]–> NUM (317; 100%).
Treebank Statistics (UD_Portuguese-Bosque)
This feature is universal but the values Unsp
are language-specific.
It occurs with 3 different values: Fem
, Masc
, Unsp
.
108919 tokens (48%) have a non-empty value of Gender
.
18839 types (72%) occur at least once with a non-empty value of Gender
.
14360 lemmas (80%) occur at least once with a non-empty value of Gender
.
The feature is used with 10 part-of-speech tags: NOUN (40747; 18% instances), DET (34025; 15% instances), PROPN (11709; 5% instances), ADJ (10975; 5% instances), PRON (7034; 3% instances), VERB (4203; 2% instances), SYM (203; 0% instances), AUX (19; 0% instances), INTJ (3; 0% instances), PART (1; 0% instances).
NOUN
40747 NOUN tokens (97% of all NOUN
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which NOUN
and Gender
co-occurred: Number=Sing (28851; 71%).
NOUN
tokens may have the following values of Gender
:
Fem
(18236; 45% of non-emptyGender
): pessoas, parte, semana, empresa, empresas, forma, cidade, casa, vida, vezMasc
(22435; 55% of non-emptyGender
): anos, milhões, ano, dia, presidente, país, US$, por, contos, tempoUnsp
(76; 0% of non-emptyGender
): especialistas, representantes, jornalistas, habitantes, visitantes, Presidente, artistas, clientes, estudantes, projectistasEMPTY
(1170): partir, relação, causa, Estado, longo, parte, favor, termos, vez, torno
Paradigm presidente | Masc | Fem | Unsp |
---|---|---|---|
Number=Sing | presidente | presidente | Presidente |
Number=Plur | presidentes |
Gender
seems to be lexical feature of NOUN
. 97% lemmas (6428) occur only with one value of Gender
.
DET
34025 DET tokens (97% of all DET
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which DET
and Gender
co-occurred: PronType=Art (29678; 87%), Number=Sing (26885; 79%), Definite=Def (26487; 78%).
DET
tokens may have the following values of Gender
:
Fem
(14862; 44% of non-emptyGender
): a, as, uma, sua, esta, suas, essa, toda, outras, algumasMasc
(19149; 56% of non-emptyGender
): o, os, um, seu, este, seus, esse, todos, mesmo, outrosUnsp
(14; 0% of non-emptyGender
): mais, cada, qual, qualquer, Que, talEMPTY
(1211): a, as, estas, o, pouco, uma
Paradigm muito | Masc | Fem | Unsp |
---|---|---|---|
Number=Sing | muito, mais | mais, muita | |
Number=Plur | muitos, mais | muitas, mais | mais |
Number=Unsp | mais |
PROPN
11709 PROPN tokens (61% of all PROPN
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which PROPN
and Gender
co-occurred: Number=Sing (11296; 96%).
PROPN
tokens may have the following values of Gender
:
Fem
(3720; 32% of non-emptyGender
): Lisboa, Folha, Câmara, Alemanha, Comissão, França, Espanha, Europa, Associação, RússiaMasc
(7699; 66% of non-emptyGender
): São, Portugal, Brasil, José, Governo, EUA, rio, Estados, João, PÚBLICOUnsp
(290; 2% of non-emptyGender
): Coimbra, Alvalade, Maastricht, Barcelos, Braga, Ermesinde, Aveiro, Drosnin, Frankfurt, JacartaEMPTY
(7352): Paulo, e, Nacional, Unidos, Silva, Porto, Henrique, a, Lisboa, Costa
Paradigm São | Masc | Fem | Unsp |
---|---|---|---|
São | São | São |
Gender
seems to be lexical feature of PROPN
. 94% lemmas (4438) occur only with one value of Gender
.
ADJ
10975 ADJ tokens (99% of all ADJ
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which ADJ
and Gender
co-occurred: Number=Sing (7836; 71%).
ADJ
tokens may have the following values of Gender
:
Fem
(4790; 44% of non-emptyGender
): primeira, nova, maior, grande, última, segunda, boa, política, novas, próximaMasc
(6124; 56% of non-emptyGender
): cento, primeiro, novo, último, segundo, bom, últimos, maior, grande, novosUnsp
(61; 1% of non-emptyGender
): jovens, especial, melhor, capaz, favorável, grandes, inconvenientes, mole, 2., IEMPTY
(78): Estrangeiros, Nacional, regional, Europeia, geral, Central, Externo, Municipal, Portuguesa, Prisionais
Paradigm grande | Masc | Fem | Unsp |
---|---|---|---|
Number=Sing | maior, grande, máximo | maior, grande, máxima | |
Number=Plur | grandes, maiores, máximos | grandes, maiores | grandes |
PRON
7034 PRON tokens (100% of all PRON
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which PRON
and Gender
co-occurred: Number=Sing (4896; 70%), Person=EMPTY (4628; 66%), Case=EMPTY (4498; 64%).
PRON
tokens may have the following values of Gender
:
Fem
(1849; 26% of non-emptyGender
): que, se, ela, a, as, elas, lhe, esta, eu, qualMasc
(4482; 64% of non-emptyGender
): que, se, o, ele, isso, tudo, eles, lhe, os, nadaUnsp
(703; 10% of non-emptyGender
): se, quem, me, nos, eu, você, nós, que, lhe, mimEMPTY
(24): si, se, que, a, o
Paradigm que | Masc | Fem | Unsp |
---|---|---|---|
Definite=Def|Number=Sing|PronType=Art | que | ||
Number=Sing|PronType=Dem | que | ||
Number=Sing|PronType=Ind | que | que | |
Number=Sing|PronType=Int | que | que | que |
Number=Sing|PronType=Rel | que | que | que |
Number=Plur|PronType=Ind | que | ||
Number=Plur|PronType=Int | que | que | |
Number=Plur|PronType=Rel | que | que | que |
Number=Unsp|PronType=Ind | que | ||
Number=Unsp|PronType=Rel | que | ||
PronType=Rel | que |
VERB
4203 VERB tokens (18% of all VERB
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which VERB
and Gender
co-occurred: Tense=EMPTY (4203; 100%), Person=EMPTY (4202; 100%), Mood=EMPTY (4202; 100%), VerbForm=Part (4201; 100%), Number=Sing (2802; 67%).
VERB
tokens may have the following values of Gender
:
Fem
(1679; 40% of non-emptyGender
): passada, feita, feitas, prevista, considerada, criada, realizada, aberta, apresentada, dadaMasc
(2524; 60% of non-emptyGender
): passado, feito, eleito, aberto, considerado, previsto, entregue, ligados, realizado, vistoEMPTY
(18576): é, foi, são, há, tem, disse, ser, está, era, fazer
Paradigm ter | Masc | Fem |
---|---|---|
Number=Sing | tido | |
Number=Sing|Voice=Pass | tido | tida |
Number=Plur | tidas |
SYM
203 SYM tokens (100% of all SYM
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which SYM
and Gender
co-occurred: Number=Plur (203; 100%).
SYM
tokens may have the following values of Gender
:
Masc
(203; 100% of non-emptyGender
): %
AUX
19 AUX tokens (1% of all AUX
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which AUX
and Gender
co-occurred: Mood=EMPTY (19; 100%), Tense=EMPTY (19; 100%), VerbForm=Part (19; 100%), Person=EMPTY (19; 100%), Number=Sing (13; 68%).
AUX
tokens may have the following values of Gender
:
Fem
(5; 26% of non-emptyGender
): convertidas, discutidas, feridas, rejeitada, voltaMasc
(14; 74% of non-emptyGender
): sido, Acabadinho, acabados, aceite, atualizados, deslocado, interpelado, perdoados, proibidoEMPTY
(3551): foi, ser, vai, pode, foram, ter, é, está, tem, deve
Gender
seems to be lexical feature of AUX
. 100% lemmas (13) occur only with one value of Gender
.
INTJ
3 INTJ tokens (7% of all INTJ
tokens) have a non-empty value of Gender
.
INTJ
tokens may have the following values of Gender
:
Fem
(2; 67% of non-emptyGender
): Obrigada, ruaMasc
(1; 33% of non-emptyGender
): AdeusEMPTY
(43): não, Rarará, Deus, é, Ah, Ai, Alô, BINGO, Droga, Hein
PART
1 PART tokens (25% of all PART
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which PART
and Gender
co-occurred: Number=Sing (1; 100%).
PART
tokens may have the following values of Gender
:
Masc
(1; 100% of non-emptyGender
): pósEMPTY
(3): anti-, ex, pré-
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender
:
NOUN –[det]–> DET (26659; 95%),
NOUN –[amod]–> ADJ (8360; 98%),
PROPN –[det]–> DET (4417; 80%),
NOUN –[acl]–> VERB (1803; 68%),
NOUN –[appos]–> PROPN (1204; 88%),
NOUN –[conj]–> NOUN (1198; 60%),
ADJ –[det]–> DET (498; 96%),
PROPN –[appos]–> NOUN (425; 77%),
NOUN –[appos]–> NOUN (400; 64%),
PROPN –[appos]–> PROPN (398; 79%).
Gender in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]