Treebank Statistics: UD_Portuguese-PetroGold: Features: Gender
This feature is universal.
It occurs with 2 different values: Fem, Masc.
131773 tokens (53%) have a non-empty value of Gender.
11833 types (78%) occur at least once with a non-empty value of Gender.
8286 lemmas (79%) occur at least once with a non-empty value of Gender.
The feature is used with 10 part-of-speech tags: NOUN (57531; 23% instances), DET (36327; 14% instances), ADJ (17069; 7% instances), VERB (8783; 4% instances), PROPN (8285; 3% instances), PRON (3508; 1% instances), ADV (214; 0% instances), NUM (51; 0% instances), AUX (4; 0% instances), X (1; 0% instances).
NOUN
57531 NOUN tokens (100% of all NOUN tokens) have a non-empty value of Gender.
The most frequent other feature values with which NOUN and Gender co-occurred: Number=Sing (41495; 72%).
NOUN tokens may have the following values of Gender:
Fem(28733; 50% of non-emptyGender): água, figura, produção, área, argila, perfuração, forma, pressão, formação, tabelaMasc(28798; 50% of non-emptyGender): óleo, fluido, petróleo, gás, fluidos, processo, dados, campo, sistema, tempoEMPTY(29): place, Figura, Offshore, ,, Argila, Captura, Equação, Etanol, Petróleo, Processo
| Paradigm óleo | Masc | Fem |
|---|---|---|
| Number=Sing | óleo | óleo |
| Number=Plur | óleos |
Gender seems to be lexical feature of NOUN. 97% lemmas (3590) occur only with one value of Gender.
DET
36327 DET tokens (100% of all DET tokens) have a non-empty value of Gender.
The most frequent other feature values with which DET and Gender co-occurred: PronType=Art (31749; 87%), Definite=Def (29007; 80%), Number=Sing (28112; 77%).
DET tokens may have the following values of Gender:
Fem(18194; 50% of non-emptyGender): a, as, uma, esta, sua, estas, essa, suas, cada, essasMasc(18133; 50% of non-emptyGender): o, os, um, este, estes, esse, seu, esses, todos, cada
| Paradigm o | Masc | Fem |
|---|---|---|
| Definite=Def|Number=Sing|PronType=Art | o | a, á |
| Definite=Def|Number=Plur|PronType=Art | os | as, A |
| Number=Sing | o | |
| Number=Plur|PronType=Art | os |
ADJ
17069 ADJ tokens (100% of all ADJ tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADJ and Gender co-occurred: Number=Sing (11092; 65%).
ADJ tokens may have the following values of Gender:
Fem(8545; 50% of non-emptyGender): maior, grande, magnética, alta, baixa, menor, mesma, magnéticas, aquosa, continentalMasc(8524; 50% of non-emptyGender): magnético, maior, possível, necessário, magnéticos, natural, presente, diferentes, mesmo, totalEMPTY(11): subsea, primeira, próximo
| Paradigm maior | Masc | Fem |
|---|---|---|
| Number=Sing | maior | maior |
| Number=Plur | maiores | maiores |
VERB
8783 VERB tokens (43% of all VERB tokens) have a non-empty value of Gender.
The most frequent other feature values with which VERB and Gender co-occurred: Mood=EMPTY (8782; 100%), Person=EMPTY (8782; 100%), Tense=EMPTY (8782; 100%), VerbForm=Part (8775; 100%), Number=Sing (5169; 59%), Voice=EMPTY (4792; 55%).
VERB tokens may have the following values of Gender:
Fem(3791; 43% of non-emptyGender): utilizada, produzida, utilizadas, realizada, feita, obtidas, obtida, observada, associadas, observadasMasc(4992; 57% of non-emptyGender): devido, utilizado, utilizados, obtidos, apresentados, observado, realizados, obtido, associados, realizadoEMPTY(11574): pode, podem, partir, apresenta, utilizando, tem, apresentam, deve, mostra, ocorre
| Paradigm utilizar | Masc | Fem |
|---|---|---|
| Number=Sing|VerbForm=Ger | utilizado | |
| Number=Sing|VerbForm=Part | utilizado | utilizada, utilizado |
| Number=Sing|VerbForm=Part|Voice=Pass | utilizado | utilizada |
| Number=Plur|VerbForm=Part | utilizados | utilizadas |
| Number=Plur|VerbForm=Part|Voice=Pass | utilizados | utilizadas |
PROPN
8285 PROPN tokens (69% of all PROPN tokens) have a non-empty value of Gender.
The most frequent other feature values with which PROPN and Gender co-occurred: Number=Sing (8066; 97%).
PROPN tokens may have the following values of Gender:
Fem(2682; 32% of non-emptyGender): Bacia, Formação, NE-SW, MEG, ilha, Petrobras, ANP, NW-SE, Fm, GomaMasc(5603; 68% of non-emptyGender): CO2, C, Membro, Brasil, Rio, Grupo, Campos, PHPA, GX, MDLEMPTY(3712): et, al., Cabo, Frio, &, Santos, Grande, Romualdo, Campos, São
| Paradigm NE-SW | Masc | Fem |
|---|---|---|
| Number=Sing | NE-SW | NE-SW |
| Number=Plur | NE-SW |
PRON
3508 PRON tokens (65% of all PRON tokens) have a non-empty value of Gender.
The most frequent other feature values with which PRON and Gender co-occurred: Number=Sing (2396; 68%), PronType=Rel (1987; 57%).
PRON tokens may have the following values of Gender:
Fem(1219; 35% of non-emptyGender): que, a, uma, esta, elas, ela, qual, as, estas, mesmaMasc(2289; 65% of non-emptyGender): que, o, isso, isto, este, um, qual, eles, mesmo, estesEMPTY(1892): se, nos, que, nós, um
| Paradigm que | Masc | Fem |
|---|---|---|
| Number=Sing | que | que |
| Number=Plur | que | que |
ADV
214 ADV tokens (3% of all ADV tokens) have a non-empty value of Gender.
ADV tokens may have the following values of Gender:
Fem(86; 40% of non-emptyGender): onde, SIM, melhorMasc(128; 60% of non-emptyGender): ondeEMPTY(6225): mais, não, também, através, já, muito, assim, bem, ainda, além
| Paradigm onde | Masc | Fem |
|---|---|---|
| Number=Sing | onde | onde |
| Number=Plur | onde | onde |
NUM
51 NUM tokens (1% of all NUM tokens) have a non-empty value of Gender.
The most frequent other feature values with which NUM and Gender co-occurred: NumType=EMPTY (31; 61%).
NUM tokens may have the following values of Gender:
Fem(4; 8% of non-emptyGender): II.7, II.7.2, II.8.1, noveMasc(47; 92% of non-emptyGender): III.2, 36º, 43º, 44,6º, 80º, 8º, II.1, II.2.3, II.3, II.4.1EMPTY(7238): 1, dois, 3, 2, 5, 10, duas, 4, três, 2005
Gender seems to be lexical feature of NUM. 100% lemmas (50) occur only with one value of Gender.
AUX
4 AUX tokens (0% of all AUX tokens) have a non-empty value of Gender.
The most frequent other feature values with which AUX and Gender co-occurred: Mood=EMPTY (4; 100%), Number=Sing (4; 100%), Person=EMPTY (4; 100%), Tense=EMPTY (4; 100%), VerbForm=Part (4; 100%).
AUX tokens may have the following values of Gender:
Masc(4; 100% of non-emptyGender): sidoEMPTY(6570): é, são, foi, ser, foram, sendo, estão, está, será, serão
X
1 X tokens (0% of all X tokens) have a non-empty value of Gender.
The most frequent other feature values with which X and Gender co-occurred: Foreign=EMPTY (1; 100%).
X tokens may have the following values of Gender:
Masc(1; 100% of non-emptyGender): drill-inEMPTY(216): in, drill, n, flow, core, ., booster, pin, situ, stripe
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender:
NOUN –[det]–> DET (32956; 100%),
NOUN –[amod]–> ADJ (14549; 100%),
NOUN –[acl]–> VERB (4069; 93%),
NOUN –[conj]–> NOUN (2654; 61%),
VERB –[nsubj:pass]–> NOUN (2123; 77%),
PROPN –[det]–> DET (2098; 99%),
NOUN –[nmod]–> PROPN (1914; 61%),
ADJ –[obl]–> NOUN (713; 54%),
ADJ –[nsubj]–> NOUN (663; 91%),
PROPN –[conj]–> PROPN (661; 71%).