NameType
: type of named entity
Classification of named entities (token-based, no nesting of entities etc.)
The feature applies mainly to the cs-pos/PROPN tag;
in multi-word foreign names, adjectives may also have this feature
(they preserve the ADJ
tag but at the same time they would not exist in Czech
otherwise than in the named entity).
Conversion from the Prague Dependency Treebank
Lemmas in PDT contain features
that also encode types of named entities. When converting the PDT annotation
to UD, these lemma features are removed and the feature
NameType
is added to the universal features to preserve the type.
The following table lists the name types together with the most frequent examples. See http://ufal.mff.cuni.cz/techrep/tr27.pdf, page 8, section 2.1 (Lemma structure) for more details.
_;Y | given name | Jan, Jiří, Václav, Petr, Josef | “Jan, Jiří, Václav, Petr, Josef” |
_;S | surname | Klaus, Havel, Němec, Jelcin, Svoboda | “Klaus, Havel, Němec, Yeltsin, Svoboda” |
_;E | member of a particular nation, inhabitant of a particular territory | Němec, Čech, Srb, Američan, Slovák | “German, Czech, Serbian, American, Slovak” |
_;G | geographical name | Praha, ČR, Evropa, Německo, Brno | “Prague, CR, Europe, Germany, Brno” |
_;K | company, organization, institution | ODS, OSN, Sparta, ODA, Slavia | “ODS, UN, Sparta, ODA, Slavia” |
_;R | product | LN, Mercedes, Tatra, PC, MF | “LN, Mercedes, Tatra, PC, MF” |
_;m | other proper name: names of mines, stadiums, guerilla bases etc. | US, PVP, Prix, Rapaport, Tour | “US, PVP, Prix, Rapaport, Tour” |
Geo
: geographical name
Names of cities, countries, rivers, mountains etc.
Examples
- Praha “Prague”, Kostelec nad Černými lesy, Německo “Germany”
Prs
: name of person
This value is used if it is not known whether it is a given or a family name, but it is known that it is a personal name.
Giv
: given name of person
Given name (not family name). This is usually the first name in European and American names. In Chinese names, the last two syllables (of three) are usually the given name.
Examples
- Jan, Jiří, Václav
Sur
: surname / family name of person
Family name (surname). This is usually the last name in European and American names. In Chinese names, the first syllable (of three) is usually the surname.
Examples
- Klaus, Havel, Němec
Nat
: nationality
Name denoting a member of a particular nation, or inhabitant of a particular territory. This does not include derived adjectives, nor nouns denoting languages (both groups are written in lowercase). Thus Čech “Czech [man]” belongs here but český “Czech” and čeština “Czech [language]” do not.
Examples
- Čech “Czech”, Němec “German”, Pražan “Praguer”
Com
: company, organization
Pro
: product
Oth
: other
Names of stadiums, guerilla bases, events etc.
Treebank Statistics (UD_Czech)
This feature is language-specific.
It occurs with 7 different values: Com
, Geo
, Giv
, Nat
, Oth
, Pro
, Sur
.
Some words have combined values of the feature; 21 combinations have been observed: Com|Geo
, Com|Giv
, Com|Giv|Sur
, Com|Nat
, Com|Oth
, Com|Pro
, Com|Pro|Sur
, Com|Sur
, Geo|Giv
, Geo|Giv|Sur
, Geo|Oth
, Geo|Pro
, Geo|Sur
, Giv|Nat
, Giv|Oth
, Giv|Pro
, Giv|Pro|Sur
, Giv|Sur
, Nat|Sur
, Oth|Sur
, Pro|Sur
.
88937 tokens (6%) have a non-empty value of NameType
.
24371 types (19%) occur at least once with a non-empty value of NameType
.
17019 lemmas (29%) occur at least once with a non-empty value of NameType
.
The feature is used with 11 part-of-speech tags: cs-pos/PROPN (84031; 6% instances), cs-pos/ADJ (4756; 0% instances), cs-pos/ADP (71; 0% instances), cs-pos/NUM (20; 0% instances), cs-pos/ADV (17; 0% instances), cs-pos/PRON (15; 0% instances), cs-pos/VERB (13; 0% instances), cs-pos/PART (8; 0% instances), cs-pos/INTJ (4; 0% instances), cs-pos/CONJ (1; 0% instances), cs-pos/DET (1; 0% instances).
PROPN
84031 cs-pos/PROPN tokens (100% of all PROPN
tokens) have a non-empty value of NameType
.
The most frequent other feature values with which PROPN
and NameType
co-occurred: Negative=Pos (84031; 100%), Abbr=EMPTY (70989; 84%), Number=Sing (63182; 75%), Gender=Masc (48949; 58%).
PROPN
tokens may have the following values of NameType
:
Com
(12393; 15% of non-emptyNameType
): ODS, OSN, ODA, ČSSD, NATO, Sparta, ČT, HZDS, EU, FSCom,Geo
(46; 0% of non-emptyNameType
): Chelsea, Bergen, Europe, Kladno, Prague, Aral, Bay, California, Canada, DeutschlandCom,Giv
(34; 0% of non-emptyNameType
): KOVO, Kovo, Konstruktiva, Poldi, Fiorentina, Michael, Ringo, Světozor, Kovohutě, NšočiCom,Giv,Sur
(1; 0% of non-emptyNameType
): WinstonCom,Nat
(5; 0% of non-emptyNameType
): Jihlavanu, JihlavanCom,Pro
(34; 0% of non-emptyNameType
): Bild, Canon, Fiat, Honda, Fiatu, Canonu, Hondy, CANON, Fiaty, HONDACom,Sur
(44; 0% of non-emptyNameType
): Benetton, Benettonu, Mates, Winston, Maxwell, Biederstein, Bradstreet, Daimler, Dohme, DunGeo
(26520; 32% of non-emptyNameType
): Praha, ČR, Praze, USA, Evropy, Brno, Prahy, ČSFR, Evropě, NěmeckuGeo,Giv
(31; 0% of non-emptyNameType
): Amos, Gyula, Gyuly, Karin, Alma, AMOS, Amosem, Gyulu, José, JosémuGeo,Giv,Sur
(18; 0% of non-emptyNameType
): Butrus, Butruse, Keith, KozákGeo,Oth
(1; 0% of non-emptyNameType
): SaturnGeo,Pro
(2; 0% of non-emptyNameType
): Mountain, RENOVAGeo,Sur
(241; 0% of non-emptyNameType
): Breda, Paisley, Petrov, Wallis, Powell, Bihače, Wallise, Warren, Lichtenbergu, LomGiv
(15099; 18% of non-emptyNameType
): J, Jiří, Jan, Václav, Jana, Petr, M, Josef, Pavel, VladimírGiv,Nat
(3; 0% of non-emptyNameType
): HunGiv,Oth
(5; 0% of non-emptyNameType
): Miranda, David, John, MIRANDYGiv,Pro
(1; 0% of non-emptyNameType
): PascalGiv,Pro,Sur
(1; 0% of non-emptyNameType
): FigaroGiv,Sur
(139; 0% of non-emptyNameType
): Perry, Perryho, Charlie, Diega, Othello, Diego, Ricardo, Rút, Heřman, JohanNat
(2286; 3% of non-emptyNameType
): Němci, Češi, Němců, Američané, američan, Slováci, Srbové, Rusové, Srby, ČechůNat,Sur
(7; 0% of non-emptyNameType
): Uher, Maye, UHEROth
(555; 1% of non-emptyNameType
): PVP, Prix, Tour, ECU, Garden, München, line, Rapaportu, VC, AgePro
(2054; 2% of non-emptyNameType
): LN, MF, PC, Škoda, mercedes, favorit, Mir, ford, polo, WeltPro,Sur
(25; 0% of non-emptyNameType
): Kozel, Stock, Burda, Johnnie, Hornet, Walker, WalkeremSur
(24486; 29% of non-emptyNameType
): Klaus, Havel, Klause, Svoboda, Mečiar, Havla, Jelcin, John, Zeman, Němec
Paradigm Paris | Com | Geo | Giv | Oth | Pro | Sur |
---|---|---|---|---|---|---|
Animacy=Anim|Case=Acc|Gender=Masc|Number=Sing | Parise | |||||
Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing | Paris | |||||
Foreign=Foreign|Gender=Fem | Paris | |||||
Foreign=Foreign|Gender=Fem|Number=Sing | Paris | |||||
Gender=Fem | Paris | Paris | ||||
Paris |
NameType
seems to be lexical feature of PROPN
. 96% lemmas (14774) occur only with one value of NameType
.
ADJ
4756 cs-pos/ADJ tokens (3% of all ADJ
tokens) have a non-empty value of NameType
.
The most frequent other feature values with which ADJ
and NameType
co-occurred: Animacy=EMPTY (3630; 76%), Negative=Pos (2450; 52%), Degree=Pos (2437; 51%).
ADJ
tokens may have the following values of NameType
:
Com
(1199; 25% of non-emptyNameType
): RM, Pink, K, J, Deutsche, United, Die, I, U, BritishCom,Geo
(17; 0% of non-emptyNameType
): York, Covent, Abbey, Amsterdam, Bradford, Brooklyn, Louis, New, Oak, RidgeCom,Giv
(1; 0% of non-emptyNameType
): KonrádCom,Oth
(8; 0% of non-emptyNameType
): Al, Black, Box, MuteCom,Pro
(2; 0% of non-emptyNameType
): Apple, MicrosoftCom,Pro,Sur
(1; 0% of non-emptyNameType
): SunCom,Sur
(10; 0% of non-emptyNameType
): Gordon, Binder, Cocteau, Goethe, Mandel, Rambert, Random, Warner, WellesleyGeo
(733; 15% of non-emptyNameType
): New, Č, Flushing, Los, San, Tchaj, Horní, Devils, Twin, BuenosGeo,Giv
(3; 0% of non-emptyNameType
): Karl, KarlovyGeo,Oth
(1; 0% of non-emptyNameType
): SalemGeo,Pro
(4; 0% of non-emptyNameType
): York, Denver, WashingtonGeo,Sur
(11; 0% of non-emptyNameType
): Marx, Špindlerově, Lounských, Powellovo, Powellovy, Powellových, Santa, Spenglerův, WallisověGiv
(290; 6% of non-emptyNameType
): Karlovy, Karlových, Karlova, Karlově, Heinrichovy, Janova, Jindřichově, Heinrichových, Jindřichův, JežíšovaGiv,Sur
(28; 1% of non-emptyNameType
): Eukleidových, Eukleidovy, Damoklův, Heřmanův, Alláhovým, Berijova, Eukleidova, Eukleidově, Franckův, HésiodovyNat
(10; 0% of non-emptyNameType
): Američanovy, Američanův, Australanovo, Brazilcovy, Florenťanův, Indian, Irův, Němcův, Pražákovo, TaliánůvOth
(222; 5% of non-emptyNameType
): US, New, Made, Sex, al, Australian, French, Miranda, Inspiral, MuteOth,Sur
(1; 0% of non-emptyNameType
): SheaPro
(167; 4% of non-emptyNameType
): Financial, coca, Super, Chem, Eng, Prágai, Wyborcza, der, pepsi, MagyarSur
(2048; 43% of non-emptyNameType
): Milíčova, Masarykově, Benešových, Schrödingerova, Casimirův, Klausův, Masarykova, Mečiarova, Benešovy, Janáčkovy
Paradigm New | Com,Geo | Geo | Oth |
---|---|---|---|
New | New, NEW | New |
NameType
seems to be lexical feature of ADJ
. 95% lemmas (1730) occur only with one value of NameType
.
ADP
71 cs-pos/ADP tokens (0% of all ADP
tokens) have a non-empty value of NameType
.
The most frequent other feature values with which ADP
and NameType
co-occurred: AdpType=Prep (71; 100%), Case=EMPTY (70; 99%).
ADP
tokens may have the following values of NameType
:
Com
(16; 23% of non-emptyNameType
): Pro, PRO, dei, des, poGeo
(3; 4% of non-emptyNameType
): Unter, del, ÁthaGeo,Giv,Sur
(35; 49% of non-emptyNameType
): diOth
(6; 8% of non-emptyNameType
): for, Into, Pour, Pro, ToPro
(10; 14% of non-emptyNameType
): ex, della, QuantumSur
(1; 1% of non-emptyNameType
): zum
Paradigm Pro | Com | Oth |
---|---|---|
Pro, PRO | Pro |
NameType
seems to be lexical feature of ADP
. 94% lemmas (15) occur only with one value of NameType
.
NUM
20 cs-pos/NUM tokens (0% of all NUM
tokens) have a non-empty value of NameType
.
The most frequent other feature values with which NUM
and NameType
co-occurred: Gender=EMPTY (20; 100%), NumType=Card (20; 100%), NumForm=Word (20; 100%), Number=Plur (19; 95%), Case=EMPTY (19; 95%), NumValue=1,2,3 (19; 95%).
NUM
tokens may have the following values of NameType
:
Com
(20; 100% of non-emptyNameType
): Four, Seven, Twenty, Six, Tre
ADV
17 cs-pos/ADV tokens (0% of all ADV
tokens) have a non-empty value of NameType
.
The most frequent other feature values with which ADV
and NameType
co-occurred: Degree=EMPTY (14; 82%), Negative=EMPTY (14; 82%).
ADV
tokens may have the following values of NameType
:
Com
(5; 29% of non-emptyNameType
): More, Nahoru, dolů, achšavOth
(7; 41% of non-emptyNameType
): COSI, Down, How, Live, So, Up, WhyPro
(5; 29% of non-emptyNameType
): Ahead, Inside, Live, Today, Weekly
Paradigm Live | Oth | Pro |
---|---|---|
Degree=Pos|Negative=Pos | Live | |
Live |
NameType
seems to be lexical feature of ADV
. 93% lemmas (14) occur only with one value of NameType
.
PRON
15 cs-pos/PRON tokens (0% of all PRON
tokens) have a non-empty value of NameType
.
The most frequent other feature values with which PRON
and NameType
co-occurred: Variant=EMPTY (15; 100%), Reflex=EMPTY (15; 100%), Gender=EMPTY (10; 67%), Number=Sing (8; 53%), Case=EMPTY (8; 53%), Person=EMPTY (8; 53%).
PRON
tokens may have the following values of NameType
:
Com
(4; 27% of non-emptyNameType
): AllOth
(5; 33% of non-emptyNameType
): All, Everything, This, YouPro
(6; 40% of non-emptyNameType
): Ty, Your, It, man
Paradigm All | Com | Oth |
---|---|---|
Case=Acc|Gender=Neut|Number=Sing | All | |
All |
VERB
13 cs-pos/VERB tokens (0% of all VERB
tokens) have a non-empty value of NameType
.
The most frequent other feature values with which VERB
and NameType
co-occurred: Gender=EMPTY (13; 100%), Aspect=EMPTY (13; 100%), Negative=Pos (13; 100%), Number=EMPTY (8; 62%), Person=EMPTY (8; 62%), Voice=Act (7; 54%), Mood=EMPTY (7; 54%).
VERB
tokens may have the following values of NameType
:
Com
(2; 15% of non-emptyNameType
): Can, DanceOth
(9; 69% of non-emptyNameType
): Porter, Can, Comes, FAN, Feels, Said, Takes, WantPro
(2; 15% of non-emptyNameType
): Check, Lean
Paradigm Can | Com | Oth |
---|---|---|
Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act | Can | |
VerbForm=Inf | Can |
NameType
seems to be lexical feature of VERB
. 91% lemmas (10) occur only with one value of NameType
.
PART
8 cs-pos/PART tokens (0% of all PART
tokens) have a non-empty value of NameType
.
PART
tokens may have the following values of NameType
:
Com
(2; 25% of non-emptyNameType
): Non, weOth
(5; 63% of non-emptyNameType
): L, Not, at, el, tSur
(1; 13% of non-emptyNameType
): ka
INTJ
4 cs-pos/INTJ tokens (4% of all INTJ
tokens) have a non-empty value of NameType
.
INTJ
tokens may have the following values of NameType
:
Com
(1; 25% of non-emptyNameType
): HaloOth
(3; 75% of non-emptyNameType
): Bang, Boom, Crash
CONJ
1 cs-pos/CONJ tokens (0% of all CONJ
tokens) have a non-empty value of NameType
.
CONJ
tokens may have the following values of NameType
:
Com
(1; 100% of non-emptyNameType
): und
DET
1 cs-pos/DET tokens (0% of all DET
tokens) have a non-empty value of NameType
.
The most frequent other feature values with which DET
and NameType
co-occurred: Gender=Fem (1; 100%), Gender[psor]=EMPTY (1; 100%), Person=1 (1; 100%), Poss=Yes (1; 100%), PronType=Prs (1; 100%), Number=Sing (1; 100%), Reflex=EMPTY (1; 100%), Case=EMPTY (1; 100%), Number[psor]=Plur (1; 100%).
DET
tokens may have the following values of NameType
:
Oth
(1; 100% of non-emptyNameType
): Notre
Relations with Agreement in NameType
The 10 most frequent relations where parent and child node agree in NameType
:
PROPN –[conj]–> PROPN (5798; 87%),
PROPN –[foreign]–> ADJ (661; 72%),
PROPN –[foreign]–> PROPN (247; 85%),
ADJ –[foreign]–> PROPN (84; 88%),
ADJ –[amod]–> ADJ (51; 61%),
ADJ –[conj]–> ADJ (50; 75%),
PROPN –[nsubj]–> PROPN (12; 55%),
PROPN –[xcomp]–> PROPN (10; 67%),
PROPN –[cc]–> PROPN (5; 56%),
ADV –[foreign]–> PROPN (3; 60%).