Treebank Statistics: UD_Portuguese-Bosque: Features: Gender
This feature is universal.
It occurs with 2 different values: Fem, Masc.
109130 tokens (48%) have a non-empty value of Gender.
18851 types (73%) occur at least once with a non-empty value of Gender.
14461 lemmas (80%) occur at least once with a non-empty value of Gender.
The feature is used with 13 part-of-speech tags: NOUN (41220; 18% instances), DET (34574; 15% instances), PROPN (11532; 5% instances), ADJ (11338; 5% instances), PRON (6713; 3% instances), VERB (3537; 2% instances), NUM (166; 0% instances), X (17; 0% instances), ADV (14; 0% instances), AUX (9; 0% instances), SCONJ (6; 0% instances), ADP (3; 0% instances), PART (1; 0% instances).
NOUN
41220 NOUN tokens (100% of all NOUN tokens) have a non-empty value of Gender.
The most frequent other feature values with which NOUN and Gender co-occurred: Number=Sing (29542; 72%).
NOUN tokens may have the following values of Gender:
Fem(18805; 46% of non-emptyGender): pessoas, parte, semana, vez, empresa, forma, empresas, cidade, casa, vidaMasc(22415; 54% of non-emptyGender): anos, presidente, ano, dia, país, estado, tempo, contos, grupo, governoEMPTY(166): partir, especialistas, representantes, jornalistas, jovens, estudantes, habitantes, par, visitantes, Esposende
| Paradigm dia | Masc | Fem |
|---|---|---|
| Number=Sing | dia | dia |
| Number=Plur | dias |
Gender seems to be lexical feature of NOUN. 98% lemmas (6592) occur only with one value of Gender.
DET
34574 DET tokens (99% of all DET tokens) have a non-empty value of Gender.
The most frequent other feature values with which DET and Gender co-occurred: PronType=Art (30779; 89%), Definite=Def (27460; 79%), Number=Sing (27254; 79%).
DET tokens may have the following values of Gender:
Fem(15753; 46% of non-emptyGender): a, as, uma, sua, esta, suas, essa, toda, outras, algumasMasc(18821; 54% of non-emptyGender): o, os, um, seu, este, seus, esse, todos, outros, outroEMPTY(291): a, as, o, mais, qual, qualquer, tal, cada, que, um
| Paradigm o | Masc | Fem |
|---|---|---|
| Definite=Def|ExtPos=PROPN|Number=Plur|PronType=Art | As | |
| Definite=Def|Number=Sing|PronType=Art | o, Os, a, o(s) | a |
| Definite=Def|Number=Sing|PronType=Art|Typo=Yes | os | o |
| Definite=Def|Number=Plur|PronType=Art | os, o | as |
| Definite=Def|Number=Plur|PronType=Art|Typo=Yes | o | a, As |
| Definite=Ind|Number=Sing|PronType=Art | o | |
| ExtPos=PROPN|Number=Sing|PronType=Art | O | |
| Number=Sing|PronType=Art | o, A | a |
| Number=Sing|PronType=Dem | o | a |
| Number=Plur|PronType=Art | os | as |
| Number=Plur|PronType=Dem | os | as |
PROPN
11532 PROPN tokens (61% of all PROPN tokens) have a non-empty value of Gender.
The most frequent other feature values with which PROPN and Gender co-occurred: Number=Sing (11116; 96%), ExtPos=EMPTY (7643; 66%).
PROPN tokens may have the following values of Gender:
Fem(3757; 33% of non-emptyGender): Lisboa, Folha, Câmara, Alemanha, França, Comissão, Espanha, Europa, Rússia, ItáliaMasc(7775; 67% of non-emptyGender): São, Portugal, Brasil, José, Governo, EUA, Rio, Estados, João, PÚBLICOEMPTY(7225): Paulo, Nacional, Unidos, Silva, Porto, Henrique, Lisboa, Sul, Costa, República
| Paradigm São | Masc | Fem |
|---|---|---|
| Abbr=Yes|ExtPos=PROPN|Number=Sing | S. | |
| Abbr=Yes|Number=Sing | S. | |
| ExtPos=PROPN | SÃO | |
| ExtPos=PROPN|Number=Sing | São, SÃO | São |
| Number=Sing | São |
Gender seems to be lexical feature of PROPN. 95% lemmas (4385) occur only with one value of Gender.
ADJ
11338 ADJ tokens (99% of all ADJ tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADJ and Gender co-occurred: Number=Sing (8184; 72%).
ADJ tokens may have the following values of Gender:
Fem(5213; 46% of non-emptyGender): primeira, nova, maior, grande, última, mesma, boa, segunda, política, passadaMasc(6125; 54% of non-emptyGender): primeiro, novo, mesmo, passado, último, segundo, últimos, bom, maior, grandeEMPTY(60): melhor, capaz, Nacional, contente, especial, favorável, inconvenientes, jovens, mole, Aérea
| Paradigm novo | Masc | Fem |
|---|---|---|
| Number=Sing | novo | nova |
| Number=Plur | novos | novas |
PRON
6713 PRON tokens (90% of all PRON tokens) have a non-empty value of Gender.
The most frequent other feature values with which PRON and Gender co-occurred: Number=Sing (4951; 74%), Case=EMPTY (4722; 70%), Person=EMPTY (4570; 68%).
PRON tokens may have the following values of Gender:
Fem(1961; 29% of non-emptyGender): que, se, a, ela, onde, as, elas, esta, lhe, euMasc(4752; 71% of non-emptyGender): que, se, o, ele, isso, tudo, eles, os, lhe, ondeEMPTY(753): se, quem, me, nos, que, eu, você, nós, si, onde
| Paradigm que | Masc | Fem |
|---|---|---|
| Case=Acc|Number=Sing|Person=3|PronType=Int | Que | |
| Definite=Def|Number=Sing|PronType=Art | que | |
| Number=Sing|PronType=Dem | que | |
| Number=Sing|PronType=Ind | que | que |
| Number=Sing|PronType=Int | que | que |
| Number=Sing|PronType=Rel | que | que |
| Number=Sing|PronType=Rel|Typo=Yes | qu | |
| Number=Plur|PronType=Ind | que | |
| Number=Plur|PronType=Int | que | que |
| Number=Plur|PronType=Rel | que | que |
VERB
3537 VERB tokens (17% of all VERB tokens) have a non-empty value of Gender.
The most frequent other feature values with which VERB and Gender co-occurred: Person=EMPTY (3536; 100%), Tense=EMPTY (3536; 100%), Mood=EMPTY (3535; 100%), VerbForm=Part (3534; 100%), Number=Sing (2329; 66%).
VERB tokens may have the following values of Gender:
Fem(1435; 41% of non-emptyGender): feita, feitas, considerada, criada, realizada, apresentada, dada, utilizada, marcada, aprovadaMasc(2102; 59% of non-emptyGender): feito, eleito, aberto, considerado, ligados, realizado, acusado, divulgado, entregue, feitosEMPTY(17229): tem, há, disse, pode, fazer, diz, ter, é, deve, está
| Paradigm ter | Masc | Fem |
|---|---|---|
| Number=Sing | tido | |
| Number=Sing|Voice=Pass | tido | tida |
| Number=Plur | tidas |
NUM
166 NUM tokens (4% of all NUM tokens) have a non-empty value of Gender.
The most frequent other feature values with which NUM and Gender co-occurred: NumType=Mult (131; 79%).
NUM tokens may have the following values of Gender:
Fem(5; 3% of non-emptyGender): dezenas, 13, 16, 4ªMasc(161; 97% of non-emptyGender): cento, milhões, meia, dúzia, milhares, 1, 1., 14,667, 185/60, MilEMPTY(4494): um, dois, três, mil, milhões, uma, duas, quatro, cinco, 15
Gender seems to be lexical feature of NUM. 100% lemmas (21) occur only with one value of Gender.
X
17 X tokens (10% of all X tokens) have a non-empty value of Gender.
The most frequent other feature values with which X and Gender co-occurred: Number=Sing (16; 94%).
X tokens may have the following values of Gender:
Fem(5; 29% of non-emptyGender): made, Body, morcilla, naturaMasc(12; 71% of non-emptyGender): Dream, Insight, MacMillan, consejero, dolce, godfather, kebab, killer, line, primitiveEMPTY(146): in, pole, position, jet, art, body, center, computing, drag, dream
Gender seems to be lexical feature of X. 100% lemmas (16) occur only with one value of Gender.
ADV
14 ADV tokens (0% of all ADV tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADV and Gender co-occurred: Polarity=EMPTY (13; 93%).
ADV tokens may have the following values of Gender:
Fem(2; 14% of non-emptyGender): quanto, talMasc(12; 86% of non-emptyGender): quanto, entanto, inteligente-, menos, não, ontem, teatral, umEMPTY(8371): não, mais, já, também, ainda, ontem, só, depois, muito, como
| Paradigm quanto | Masc | Fem |
|---|---|---|
| PronType=Ind | quanto | |
| PronType=Int | quanto | |
| PronType=Rel | quanto |
AUX
9 AUX tokens (0% of all AUX tokens) have a non-empty value of Gender.
The most frequent other feature values with which AUX and Gender co-occurred: Mood=EMPTY (9; 100%), Number=Sing (9; 100%), Person=EMPTY (9; 100%), Tense=EMPTY (9; 100%), VerbForm=Part (9; 100%).
AUX tokens may have the following values of Gender:
Masc(9; 100% of non-emptyGender): sidoEMPTY(5018): é, foi, ser, são, está, foram, vai, era, ter, será
SCONJ
6 SCONJ tokens (0% of all SCONJ tokens) have a non-empty value of Gender.
SCONJ tokens may have the following values of Gender:
Fem(3; 50% of non-emptyGender): Uma, que, unsMasc(3; 50% of non-emptyGender): queEMPTY(5352): que, a, de, para, se, porque, como, por, em, quando
| Paradigm que | Masc | Fem |
|---|---|---|
| que | ||
| PronType=Rel | que | que |
ADP
3 ADP tokens (0% of all ADP tokens) have a non-empty value of Gender.
ADP tokens may have the following values of Gender:
Masc(3; 100% of non-emptyGender): de, queEMPTY(33781): de, em, a, por, com, para, como, entre, sobre, até
PART
1 PART tokens (33% of all PART tokens) have a non-empty value of Gender.
The most frequent other feature values with which PART and Gender co-occurred: ExtPos=EMPTY (1; 100%), Number=Sing (1; 100%).
PART tokens may have the following values of Gender:
Masc(1; 100% of non-emptyGender): pósEMPTY(2): anti, pré
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender:
NOUN –[det]–> DET (28280; 100%),
NOUN –[amod]–> ADJ (8998; 100%),
PROPN –[det]–> DET (4454; 81%),
NOUN –[acl]–> VERB (1597; 67%),
NOUN –[conj]–> NOUN (1383; 60%),
NOUN –[appos]–> PROPN (1216; 90%),
PROPN –[conj]–> PROPN (811; 75%),
VERB –[nsubj:pass]–> NOUN (572; 79%),
ADJ –[nsubj]–> NOUN (435; 97%),
ADJ –[conj]–> ADJ (385; 98%).