Treebank Statistics: UD_Upper_Sorbian-UFAL: Features: Gender
This feature is universal.
It occurs with 3 different values: Fem, Masc, Neut.
This is a layered feature with the following layers: Gender, Gender[psor].
4917 tokens (44%) have a non-empty value of Gender.
3298 types (76%) occur at least once with a non-empty value of Gender.
2112 lemmas (69%) occur at least once with a non-empty value of Gender.
The feature is used with 9 part-of-speech tags: NOUN (2521; 23% instances), ADJ (1381; 12% instances), PROPN (539; 5% instances), DET (269; 2% instances), PRON (120; 1% instances), VERB (48; 0% instances), NUM (36; 0% instances), AUX (2; 0% instances), ADV (1; 0% instances).
NOUN
2521 NOUN tokens (99% of all NOUN tokens) have a non-empty value of Gender.
The most frequent other feature values with which NOUN and Gender co-occurred: Number=Sing (1684; 67%), Animacy=EMPTY (1380; 55%).
NOUN tokens may have the following values of Gender:
Fem(930; 37% of non-emptyGender): l, rěč, woda, rěčow, stolica, rostliny, wody, rěče, knihi, bibliotekiMasc(1143; 45% of non-emptyGender): př, kilometrow, nastawki, kraja, lěttysaca, čas, institut, stat, wobraz, časaNeut(448; 18% of non-emptyGender): město, lěta, lěće, mócnarstwo, pismo, słowo, lět, města, hospodarstwo, knjejstwaEMPTY(16): km, m, CEST, droždźemi, duri, hodź, jan, thumb
| Paradigm dataja | Fem | Neut |
|---|---|---|
| Case=Acc | dataje, daty | daty |
| Case=Gen | datow |
Gender seems to be lexical feature of NOUN. 99% lemmas (1012) occur only with one value of Gender.
ADJ
1381 ADJ tokens (97% of all ADJ tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADJ and Gender co-occurred: Animacy=EMPTY (1216; 88%), Voice=EMPTY (1198; 87%), VerbForm=EMPTY (1197; 87%), Number=Sing (884; 64%), Degree=EMPTY (870; 63%).
ADJ tokens may have the following values of Gender:
Fem(555; 40% of non-emptyGender): serbskeje, druhe, wulku, serbska, wotpowědne, dalše, druhich, hornjej, kruta, němskejMasc(581; 42% of non-emptyGender): serbski, prěni, Serbskeho, wulki, Ekscelentny, Serbskim, Třećeho, Zjednoćenych, ablawtowych, cyłymNeut(245; 18% of non-emptyGender): najwjetše, wulke, klinowe, wuznamne, prěnje, Kaspiske, Kaspiskeho, aktualne, běłe, dołheEMPTY(38): němsko, Awstro, Planowane, Tibeto, al, d, dołho, druhich, duchowno, hornjo
| Paradigm serbski | Masc | Fem | Neut |
|---|---|---|---|
| Animacy=Inan|Case=Acc|Degree=Pos|Number=Dual | serbskej | ||
| Case=Acc|Degree=Pos|Number=Sing | serbski | serbske | |
| Case=Acc|Number=Sing | serbsku | ||
| Case=Dat|Number=Sing | serbskemu | ||
| Case=Dat|Number=Plur | serbskim | ||
| Case=Gen|Degree=Pos|Number=Sing | serbskeje | ||
| Case=Gen|Number=Sing | Serbskeho | serbskeje | |
| Case=Gen|Number=Plur | serbskich | ||
| Case=Ins|Number=Sing | serbskej, serbsku | ||
| Case=Loc|Degree=Pos|Number=Sing | Serbskim | ||
| Case=Loc|Number=Sing | Serbskim | serbskej | |
| Case=Nom|Degree=Pos|Number=Sing | Serbski, SERBSKI | serbska | |
| Case=Nom|Number=Sing | serbska | ||
| Case=Nom|Number=Plur | serbske |
PROPN
539 PROPN tokens (90% of all PROPN tokens) have a non-empty value of Gender.
The most frequent other feature values with which PROPN and Gender co-occurred: Number=Sing (484; 90%).
PROPN tokens may have the following values of Gender:
Fem(209; 39% of non-emptyGender): Mezopotamiskeje, Mezopotamiska, Mezopotamiskej, Wikimedia, Łužicy, Europje, Assyriska, Němskeje, Wikipedija, AfriceMasc(281; 52% of non-emptyGender): Sumeričanow, Assur, Aššur, Babylon, Budyšinje, Hammurabi, Jakub, Ur, Akkada, AramejčanowNeut(49; 9% of non-emptyGender): Commons, Esperanto, Nadu, Slepo, Łobjom, Aleppo, Baku, Bangalore, Bengaluru, EsperanćeEMPTY(57): Aššur, C, Adl, Angeles, Gasche, Los, Tamil, Tlustulimu, Beth, Bilād
| Paradigm Institut | Masc | Neut |
|---|---|---|
| Animacy=Inan|Case=Acc | Institut | |
| Case=Nom | Institut |
Gender seems to be lexical feature of PROPN. 99% lemmas (319) occur only with one value of Gender.
DET
269 DET tokens (83% of all DET tokens) have a non-empty value of Gender.
The most frequent other feature values with which DET and Gender co-occurred: Abbr=EMPTY (234; 87%), Number[psor]=EMPTY (225; 84%), Person=EMPTY (225; 84%), Poss=EMPTY (196; 73%), Animacy=EMPTY (179; 67%), Number=Sing (164; 61%).
DET tokens may have the following values of Gender:
Fem(125; 46% of non-emptyGender): n, kotraž, kotrež, tuta, swoju, tutej, tutu, tute, kotrejž, někotrychMasc(104; 39% of non-emptyGender): kotrež, kotryž, tutón, n, někotři, swoje, tute, tutym, kotrychž, někotreNeut(40; 15% of non-emptyGender): kotrež, tute, kóžde, žane, swoje, tajke, twojim, Wobě, kajke, kotrejžEMPTY(57): jeho, jich, wjele, jeje, mnoho, n, Někotre, Tutón, Wšě, mjenje
| Paradigm kotryž | Masc | Fem | Neut |
|---|---|---|---|
| Animacy=Anim|Case=Dat|Number=Plur | kotrymž | ||
| Animacy=Anim|Case=Nom|Number=Sing | kotryž | ||
| Animacy=Anim|Case=Nom|Number=Plur | kotřiž | ||
| Animacy=Inan|Case=Acc|Number=Sing | kotryž | ||
| Animacy=Inan|Case=Gen|Number=Plur | kotrychž | ||
| Animacy=Inan|Case=Loc|Number=Plur | kotrychž | ||
| Animacy=Inan|Case=Nom|Number=Sing | kotryž, kotrež | ||
| Animacy=Inan|Case=Nom|Number=Plur | kotrež | ||
| Case=Gen|Number=Sing | kotrehož | kotrejež | |
| Case=Ins|Number=Plur | kotrymiž | ||
| Case=Loc|Number=Sing | kotrymž | kotrejž | |
| Case=Loc|Number=Plur | kotrychž | kotrychž | |
| Case=Nom|Number=Sing | kotryž | kotraž | kotrež |
| Case=Nom|Number=Dual | kotrejž | ||
| Case=Nom|Number=Plur | kotrež | kotrež | kotrež |
PRON
120 PRON tokens (36% of all PRON tokens) have a non-empty value of Gender.
The most frequent other feature values with which PRON and Gender co-occurred: Reflex=EMPTY (120; 100%), Number=Sing (109; 91%), Person=EMPTY (71; 59%).
PRON tokens may have the following values of Gender:
Fem(18; 15% of non-emptyGender): wona, Jej, je, jeje, ju, njej, njeje, nju, woneMasc(22; 18% of non-emptyGender): wón, jón, Woni, je, jeho, kiž, nich, nimNeut(80; 67% of non-emptyGender): to, toho, tym, wone, wono, čimž, tomu, něšto, štož, nimEMPTY(215): so, kiž, je, sej, nam, sobu, ty, Wonej, sebi
| Paradigm wón | Masc | Fem | Neut |
|---|---|---|---|
| Animacy=Anim|Case=Nom|Number=Plur | Woni | ||
| Animacy=Inan|Case=Acc|Number=Plur | je | ||
| Animacy=Nhum|Case=Acc|Number=Sing | jeho | ||
| Case=Acc|Number=Sing | jón, jeho | ju, nju | |
| Case=Acc|Number=Plur | je | ||
| Case=Dat|Number=Sing | Jej, jeje, njej | ||
| Case=Gen|Number=Sing | njeje | ||
| Case=Gen|Number=Plur | nich | ||
| Case=Ins|Number=Plur | nimi | ||
| Case=Loc|Number=Sing | nim | nim | |
| Case=Nom|Number=Sing | wón | wona | wono, wone |
| Case=Nom|Number=Plur | wone | wone | |
| Number=Sing | jón |
VERB
48 VERB tokens (6% of all VERB tokens) have a non-empty value of Gender.
The most frequent other feature values with which VERB and Gender co-occurred: Mood=EMPTY (46; 96%), Person=EMPTY (46; 96%), Tense=Past (46; 96%), VerbForm=Part (46; 96%), Number=Sing (30; 63%).
VERB tokens may have the following values of Gender:
Fem(12; 25% of non-emptyGender): dodźeržała, eksistowali, kontrolowali, móhła, předstaja, přeměniła, přełožili, přistupiła, rostła, stabilizowałaMasc(30; 63% of non-emptyGender): přewzali, wužiwali, započał, ilustrował, mał, mjenował, měł, nastał, poradźił, poznamjeniliNeut(6; 13% of non-emptyGender): móhli, poradźiło, předstajili, stali, stało, wočakowałoEMPTY(770): ma, leži, móže, wobsahuje, móžeš, su, hlej, maja, rěči, běchu
| Paradigm předstajić | Masc | Fem | Neut |
|---|---|---|---|
| Animacy=Inan|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin | předstaja | ||
| Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin | předstaja | ||
| Number=Plur|Tense=Past|VerbForm=Part|Voice=Act | předstajili |
NUM
36 NUM tokens (9% of all NUM tokens) have a non-empty value of Gender.
The most frequent other feature values with which NUM and Gender co-occurred: NumType=Card (35; 97%).
NUM tokens may have the following values of Gender:
Fem(12; 33% of non-emptyGender): jedna, jednu, štyri, dwaj, dwě, dwěmaj, miliardow, woběmaj, štyrjochMasc(20; 56% of non-emptyGender): jedyn, dwaj, Mio, dweju, jedneho, jedny, traje, štyrjochNeut(4; 11% of non-emptyGender): dwěmaj, jednymEMPTY(346): 2, 1, 6, 4, 3, 5, 7, I, 000, 10
| Paradigm jedyn | Masc | Fem | Neut |
|---|---|---|---|
| Animacy=Anim|Case=Nom | jedny, jedyn | ||
| Animacy=Inan|Case=Acc | jedyn | ||
| Animacy=Inan|Case=Gen | jedneho | ||
| Animacy=Inan|Case=Nom | jedyn | ||
| Case=Acc | jedyn | jednu | |
| Case=Loc | jednym | ||
| Case=Nom | jedyn | jedna |
AUX
2 AUX tokens (1% of all AUX tokens) have a non-empty value of Gender.
The most frequent other feature values with which AUX and Gender co-occurred: Mood=EMPTY (2; 100%), Number=Sing (2; 100%), Person=EMPTY (2; 100%), Tense=Past (2; 100%), VerbForm=Part (2; 100%), Voice=Act (2; 100%).
AUX tokens may have the following values of Gender:
Fem(1; 50% of non-emptyGender): byłaMasc(1; 50% of non-emptyGender): byłEMPTY(286): je, su, bu, bě, buchu, by, njeje, njejsu, běchu, buštej
| Paradigm być | Masc | Fem |
|---|---|---|
| był | była |
ADV
1 ADV tokens (0% of all ADV tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADV and Gender co-occurred: Degree=Pos (1; 100%), PronType=EMPTY (1; 100%).
ADV tokens may have the following values of Gender:
Fem(1; 100% of non-emptyGender): wuchodneEMPTY(534): tež, tak, hišće, zwjetša, hač, něhdźe, hižo, tu, wjace, najprjedy
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender:
NOUN –[amod]–> ADJ (1048; 96%),
NOUN –[det]–> DET (169; 79%),
NOUN –[conj]–> NOUN (161; 68%),
ADJ –[nsubj]–> NOUN (75; 89%),
ADJ –[conj]–> ADJ (62; 97%),
PROPN –[conj]–> PROPN (52; 59%),
PROPN –[flat]–> PROPN (52; 73%),
PROPN –[amod]–> ADJ (41; 95%),
PROPN –[nmod]–> NOUN (22; 67%),
ADJ –[nsubj]–> DET (21; 95%).