home cs/feat edit page issue tracker

NameType: type of named entity

Classification of named entities (token-based, no nesting of entities etc.) The feature applies mainly to the cs-pos/PROPN tag; in multi-word foreign names, adjectives may also have this feature (they preserve the ADJ tag but at the same time they would not exist in Czech otherwise than in the named entity).

Conversion from the Prague Dependency Treebank

Lemmas in PDT contain features that also encode types of named entities. When converting the PDT annotation to UD, these lemma features are removed and the feature NameType is added to the universal features to preserve the type.

The following table lists the name types together with the most frequent examples. See http://ufal.mff.cuni.cz/techrep/tr27.pdf, page 8, section 2.1 (Lemma structure) for more details.

_;Ygiven nameJan, Jiří, Václav, Petr, Josef“Jan, Jiří, Václav, Petr, Josef”
_;SsurnameKlaus, Havel, Němec, Jelcin, Svoboda“Klaus, Havel, Němec, Yeltsin, Svoboda”
_;Emember of a particular nation, inhabitant of a particular territoryNěmec, Čech, Srb, Američan, Slovák“German, Czech, Serbian, American, Slovak”
_;Ggeographical namePraha, ČR, Evropa, Německo, Brno“Prague, CR, Europe, Germany, Brno”
_;Kcompany, organization, institutionODS, OSN, Sparta, ODA, Slavia“ODS, UN, Sparta, ODA, Slavia”
_;RproductLN, Mercedes, Tatra, PC, MF“LN, Mercedes, Tatra, PC, MF”
_;mother proper name: names of mines, stadiums, guerilla bases etc.US, PVP, Prix, Rapaport, Tour“US, PVP, Prix, Rapaport, Tour”

Geo: geographical name

Names of cities, countries, rivers, mountains etc.

Examples

Prs: name of person

This value is used if it is not known whether it is a given or a family name, but it is known that it is a personal name.

Giv: given name of person

Given name (not family name). This is usually the first name in European and American names. In Chinese names, the last two syllables (of three) are usually the given name.

Examples

Sur: surname / family name of person

Family name (surname). This is usually the last name in European and American names. In Chinese names, the first syllable (of three) is usually the surname.

Examples

Nat: nationality

Name denoting a member of a particular nation, or inhabitant of a particular territory. This does not include derived adjectives, nor nouns denoting languages (both groups are written in lowercase). Thus Čech  “Czech [man]” belongs here but český  “Czech” and čeština  “Czech [language]” do not.

Examples

Com: company, organization

Pro: product

Oth: other

Names of stadiums, guerilla bases, events etc.


Treebank Statistics (UD_Czech)

This feature is language-specific. It occurs with 7 different values: Com, Geo, Giv, Nat, Oth, Pro, Sur. Some words have combined values of the feature; 21 combinations have been observed: Com|Geo, Com|Giv, Com|Giv|Sur, Com|Nat, Com|Oth, Com|Pro, Com|Pro|Sur, Com|Sur, Geo|Giv, Geo|Giv|Sur, Geo|Oth, Geo|Pro, Geo|Sur, Giv|Nat, Giv|Oth, Giv|Pro, Giv|Pro|Sur, Giv|Sur, Nat|Sur, Oth|Sur, Pro|Sur.

88937 tokens (6%) have a non-empty value of NameType. 24371 types (19%) occur at least once with a non-empty value of NameType. 17019 lemmas (29%) occur at least once with a non-empty value of NameType. The feature is used with 11 part-of-speech tags: cs-pos/PROPN (84031; 6% instances), cs-pos/ADJ (4756; 0% instances), cs-pos/ADP (71; 0% instances), cs-pos/NUM (20; 0% instances), cs-pos/ADV (17; 0% instances), cs-pos/PRON (15; 0% instances), cs-pos/VERB (13; 0% instances), cs-pos/PART (8; 0% instances), cs-pos/INTJ (4; 0% instances), cs-pos/CONJ (1; 0% instances), cs-pos/DET (1; 0% instances).

PROPN

84031 cs-pos/PROPN tokens (100% of all PROPN tokens) have a non-empty value of NameType.

The most frequent other feature values with which PROPN and NameType co-occurred: Negative=Pos (84031; 100%), Abbr=EMPTY (70989; 84%), Number=Sing (63182; 75%), Gender=Masc (48949; 58%).

PROPN tokens may have the following values of NameType:

Paradigm ParisComGeoGivOthProSur
Animacy=Anim|Case=Acc|Gender=Masc|Number=SingParise
Animacy=Anim|Case=Nom|Gender=Masc|Number=SingParis
Foreign=Foreign|Gender=FemParis
Foreign=Foreign|Gender=Fem|Number=SingParis
Gender=FemParisParis
Paris

NameType seems to be lexical feature of PROPN. 96% lemmas (14774) occur only with one value of NameType.

ADJ

4756 cs-pos/ADJ tokens (3% of all ADJ tokens) have a non-empty value of NameType.

The most frequent other feature values with which ADJ and NameType co-occurred: Animacy=EMPTY (3630; 76%), Negative=Pos (2450; 52%), Degree=Pos (2437; 51%).

ADJ tokens may have the following values of NameType:

Paradigm NewCom,GeoGeoOth
NewNew, NEWNew

NameType seems to be lexical feature of ADJ. 95% lemmas (1730) occur only with one value of NameType.

ADP

71 cs-pos/ADP tokens (0% of all ADP tokens) have a non-empty value of NameType.

The most frequent other feature values with which ADP and NameType co-occurred: AdpType=Prep (71; 100%), Case=EMPTY (70; 99%).

ADP tokens may have the following values of NameType:

Paradigm ProComOth
Pro, PROPro

NameType seems to be lexical feature of ADP. 94% lemmas (15) occur only with one value of NameType.

NUM

20 cs-pos/NUM tokens (0% of all NUM tokens) have a non-empty value of NameType.

The most frequent other feature values with which NUM and NameType co-occurred: Gender=EMPTY (20; 100%), NumType=Card (20; 100%), NumForm=Word (20; 100%), Number=Plur (19; 95%), Case=EMPTY (19; 95%), NumValue=1,2,3 (19; 95%).

NUM tokens may have the following values of NameType:

ADV

17 cs-pos/ADV tokens (0% of all ADV tokens) have a non-empty value of NameType.

The most frequent other feature values with which ADV and NameType co-occurred: Degree=EMPTY (14; 82%), Negative=EMPTY (14; 82%).

ADV tokens may have the following values of NameType:

Paradigm LiveOthPro
Degree=Pos|Negative=PosLive
Live

NameType seems to be lexical feature of ADV. 93% lemmas (14) occur only with one value of NameType.

PRON

15 cs-pos/PRON tokens (0% of all PRON tokens) have a non-empty value of NameType.

The most frequent other feature values with which PRON and NameType co-occurred: Variant=EMPTY (15; 100%), Reflex=EMPTY (15; 100%), Gender=EMPTY (10; 67%), Number=Sing (8; 53%), Case=EMPTY (8; 53%), Person=EMPTY (8; 53%).

PRON tokens may have the following values of NameType:

Paradigm AllComOth
Case=Acc|Gender=Neut|Number=SingAll
All

VERB

13 cs-pos/VERB tokens (0% of all VERB tokens) have a non-empty value of NameType.

The most frequent other feature values with which VERB and NameType co-occurred: Gender=EMPTY (13; 100%), Aspect=EMPTY (13; 100%), Negative=Pos (13; 100%), Number=EMPTY (8; 62%), Person=EMPTY (8; 62%), Voice=Act (7; 54%), Mood=EMPTY (7; 54%).

VERB tokens may have the following values of NameType:

Paradigm CanComOth
Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=ActCan
VerbForm=InfCan

NameType seems to be lexical feature of VERB. 91% lemmas (10) occur only with one value of NameType.

PART

8 cs-pos/PART tokens (0% of all PART tokens) have a non-empty value of NameType.

PART tokens may have the following values of NameType:

INTJ

4 cs-pos/INTJ tokens (4% of all INTJ tokens) have a non-empty value of NameType.

INTJ tokens may have the following values of NameType:

CONJ

1 cs-pos/CONJ tokens (0% of all CONJ tokens) have a non-empty value of NameType.

CONJ tokens may have the following values of NameType:

DET

1 cs-pos/DET tokens (0% of all DET tokens) have a non-empty value of NameType.

The most frequent other feature values with which DET and NameType co-occurred: Gender=Fem (1; 100%), Gender[psor]=EMPTY (1; 100%), Person=1 (1; 100%), Poss=Yes (1; 100%), PronType=Prs (1; 100%), Number=Sing (1; 100%), Reflex=EMPTY (1; 100%), Case=EMPTY (1; 100%), Number[psor]=Plur (1; 100%).

DET tokens may have the following values of NameType:

Relations with Agreement in NameType

The 10 most frequent relations where parent and child node agree in NameType: PROPN –[conj]–> PROPN (5798; 87%), PROPN –[foreign]–> ADJ (661; 72%), PROPN –[foreign]–> PROPN (247; 85%), ADJ –[foreign]–> PROPN (84; 88%), ADJ –[amod]–> ADJ (51; 61%), ADJ –[conj]–> ADJ (50; 75%), PROPN –[nsubj]–> PROPN (12; 55%), PROPN –[xcomp]–> PROPN (10; 67%), PROPN –[cc]–> PROPN (5; 56%), ADV –[foreign]–> PROPN (3; 60%).