Treebank Statistics: UD_Sinhala-STB: Features: Gender
This feature is universal.
It occurs with 2 different values: Masc, Neut.
264 tokens (30%) have a non-empty value of Gender.
217 types (43%) occur at least once with a non-empty value of Gender.
187 lemmas (45%) occur at least once with a non-empty value of Gender.
The feature is used with 3 part-of-speech tags: NOUN (223; 25% instances), PRON (21; 2% instances), PROPN (20; 2% instances).
NOUN
223 NOUN tokens (72% of all NOUN tokens) have a non-empty value of Gender.
The most frequent other feature values with which NOUN and Gender co-occurred: Animacy=EMPTY (182; 82%), Number=Sing (141; 63%).
NOUN tokens may have the following values of Gender:
Masc(36; 16% of non-emptyGender): මහතා, ජනතාව, ප්රධානයකු, අධිපතිවරයාට, අස්සන්, ආරක්ෂක, කෙනාම, ත්රස්තවාදීන්, තැනැත්තන්, දෙන්නකුNeut(187; 84% of non-emptyGender): අයවැය, ආණ්ඩුව, ආර්ථික, තත්ත්වය, දේශපාලන, යුද, අවස්ථාව, ආර්ථිකය, උද්ධමනය, ක්රමයEMPTY(85): කිරීම, ජනතාවට, සිදු, අද, අහෝසි, ආර්ථික, කොටි, බොහෝ, හැඟීමක්, අංශ
Gender seems to be lexical feature of NOUN. 100% lemmas (173) occur only with one value of Gender.
PRON
21 PRON tokens (48% of all PRON tokens) have a non-empty value of Gender.
The most frequent other feature values with which PRON and Gender co-occurred: Poss=EMPTY (21; 100%), Animacy=EMPTY (19; 90%), Person=EMPTY (18; 86%), Number=Sing (16; 76%), Case=Nom (13; 62%), PronType=Dem (11; 52%).
PRON tokens may have the following values of Gender:
Masc(10; 48% of non-emptyGender): ඔහු, ඔහුටNeut(11; 52% of non-emptyGender): ඒ, ඊට, එය, ඉන්, මෙයEMPTY(23): එය, එහි, ඒ, සිය, අප, අපට, අපේ, එකිනෙකා, එම, ඔව්හු
PROPN
20 PROPN tokens (53% of all PROPN tokens) have a non-empty value of Gender.
The most frequent other feature values with which PROPN and Gender co-occurred: Animacy=EMPTY (20; 100%), Number=Sing (20; 100%), Foreign=EMPTY (19; 95%), Case=Nom (13; 65%), Person=EMPTY (12; 60%).
PROPN tokens may have the following values of Gender:
Masc(8; 40% of non-emptyGender): මහින්ද, රනිල්, වික්රමසිංහ, ෆොන්සේකාNeut(12; 60% of non-emptyGender): ලංකාව, ඉන්දියාව, ඉරානය, චීනය, ටැන්සානියාව, පලස්තීනය, පාකිස්ථානය, ලංකාවක්, ලංකාවට, සිංගප්පූරුවEMPTY(18): ශ්රී, රාජපක්ෂ, අමෙරිකාවේ, කොසෝවෝ, ජුලියස්, නියරේරේ, මාඕ, යුනෙස්කෝ, ලිප්ටන්, ෂැවොලින්
Gender seems to be lexical feature of PROPN. 100% lemmas (12) occur only with one value of Gender.
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender:
NOUN –[compound]–> NOUN (9; 53%),
PROPN –[flat]–> NOUN (6; 86%),
NOUN –[dep]–> NOUN (5; 83%),
NOUN –[nmod:poss]–> NOUN (4; 80%),
NOUN –[nsubj]–> PROPN (3; 60%),
NOUN –[case]–> NOUN (1; 100%),
NOUN –[conj]–> NOUN (1; 100%),
PROPN –[conj]–> PROPN (1; 100%).