home cs/pos edit page issue tracker

DET: determiner

Definition

Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.

An important point to note is that the traditional grammar of Czech does not define determiners as a separate word class. Czech does not have articles. Most determiners are traditionally called pronouns; that is, an UD-conformant annotation of Czech must distinguish between substantive pronouns (UD tag PRON) and attributive pronouns (UD tag DET).

Also note that the DET tag includes (pronominal) quantifiers (words like mnoho, málo  “many, few”), which the traditional grammar classifies as a special subclass of numerals. However, cardinal numerals in the narrow sense (jeden, pět, sto) are not tagged DET even though some authors would include them in quantifiers. Cardinal numbers have their own tag NUM.

Conversion from the Prague Dependency Treebank

Since the PDT tagset (like all other Czech tagsets) does not distinguish substantive and attributive pronouns, morphological tags alone are not enough to find the correct universal POS tag. Morphological rules could help, as the inflection patterns of some pronouns bear similarities to adjectival inflection; nevertheless, there will be other cases that cannot be solved this way. We have to examine the dependency tree. If a pronoun modifies a noun, it should be tagged DET. Otherwise it is PRON. As a result, all words that can be tagged DET can also be tagged PRON, but some words can only be tagged PRON. (We cannot recognize cases where the pronoun is in fact attributive, but the modified noun has been elided and is not represented in the tree.)

For instance, tohle  “this” is either pronoun (Tohle jsem viděl včera.  “I saw this yesterday.”) or determiner (Tohle auto jsem viděl včera.  “I saw this car yesterday.”)

Examples

References


Treebank Statistics (UD_Czech)

There are 55 DET lemmas (0%), 325 DET types (0%) and 27813 DET tokens (2%). Out of 17 observed tags, the rank of DET is: 10 in number of lemmas, 8 in number of types and 11 in number of tokens.

The 10 most frequent DET lemmas: tento, jeho, svůj, můj, ten, některý, několik, takový, žádný, jenž

The 10 most frequent DET types: jeho, jejich, své, této, její, tento, tohoto, svou, tato, těchto

The 10 most frequent ambiguous lemmas: tento (DET 6202, PRON 99), jeho (DET 5790, PRON 46), svůj (DET 4767, PRON 113, ADJ 4), můj (DET 2581, PRON 71), ten (PRON 11968, DET 1312), některý (DET 1096, PRON 234), několik (DET 871, PRON 26), takový (DET 866, PRON 169), žádný (DET 744, PRON 87), jenž (PRON 2211, DET 648)

The 10 most frequent ambiguous types: jeho (DET 2456, PRON 33), jejich (DET 1697, PRON 12), své (DET 1366, PRON 40, ADJ 1), této (DET 993, PRON 3), její (DET 711, PRON 8), tento (DET 585, PRON 10), svou (DET 607, PRON 7), tato (DET 377, PRON 7), těchto (DET 581, PRON 8), tyto (DET 432, PRON 1)

Morphology

The form / lemma ratio of DET is 5.909091 (the average of all parts of speech is 2.195970).

The 1st highest number of forms (27) was observed with the lemma “můj”: Mí, moje, moji, mojí, mou, má, mé, mého, mém, mému, mých, mýho, mým, mými, můj, n, naše, našeho, našem, našemu, naši, našich, našim, našimi, naší, naším, náš

The 2nd highest number of forms (19) was observed with the lemma “jakýkoliv”: jakoukoli, jakoukoliv, jakákoli, jakákoliv, jakéhokoli, jakéhokoliv, jakékoli, jakékoliv, jakémkoli, jakémkoliv, jakémukoli, jakémukoliv, jakýchkoli, jakýchkoliv, jakýkoli, jakýkoliv, jakýmikoliv, jakýmkoli, jakýmkoliv

The 3rd highest number of forms (16) was observed with the lemma “ten”: ta, ten, ti, to, toho, tom, tomu, tou, tu, ty, té, tím, těch, těm, těma, těmi

DET occurs with 16 features: cs-feat/PronType (27813; 100% instances), cs-feat/Case (22385; 80% instances), cs-feat/Number (21264; 76% instances), cs-feat/Gender (17996; 65% instances), cs-feat/Poss (14044; 50% instances), cs-feat/Number[psor] (9276; 33% instances), cs-feat/Person (9276; 33% instances), cs-feat/Reflex (4768; 17% instances), cs-feat/Gender[psor] (4331; 16% instances), cs-feat/Animacy (2621; 9% instances), cs-feat/NumType (1552; 6% instances), cs-feat/Negative (744; 3% instances), cs-feat/Abbr (15; 0% instances), cs-feat/Style (14; 0% instances), cs-feat/Foreign (1; 0% instances), cs-feat/NameType (1; 0% instances)

DET occurs with 40 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Foreign, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Gender[psor]=Fem, Gender[psor]=Masc,Neut, NameType=Oth, Negative=Neg, NumType=Card, NumType=Ord, Number=Dual, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=1, Person=2, Person=3, Poss=Yes, PronType=Dem, PronType=Dem,Ind, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, Reflex=Yes, Style=Coll

DET occurs with 273 feature combinations. The most frequent feature combination is Gender[psor]=Masc,Neut|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs (2718 tokens). Examples: jeho

Relations

DET nodes are attached to their parents using 8 different relations: cs-dep/det (26231; 94% instances), cs-dep/det:numgov (978; 4% instances), cs-dep/det:nummod (564; 2% instances), cs-dep/advcl (24; 0% instances), cs-dep/acl (9; 0% instances), cs-dep/nmod (3; 0% instances), cs-dep/ccomp (2; 0% instances), cs-dep/csubj (2; 0% instances)

Parents of DET nodes belong to 6 different parts of speech: NOUN (27483; 99% instances), PROPN (108; 0% instances), ADJ (106; 0% instances), PRON (97; 0% instances), NUM (16; 0% instances), DET (3; 0% instances)

27203 (98%) DET nodes are leaves.

370 (1%) DET nodes have one child.

181 (1%) DET nodes have two children.

59 (0%) DET nodes have three or more children.

The highest child degree of a DET node is 12.

Children of DET nodes are attached using 18 different relations: cs-dep/advmod:emph (180; 19% instances), cs-dep/punct (130; 13% instances), cs-dep/conj (117; 12% instances), cs-dep/cc (99; 10% instances), cs-dep/case (93; 10% instances), cs-dep/acl (90; 9% instances), cs-dep/advmod (62; 6% instances), cs-dep/advcl (34; 4% instances), cs-dep/mark (30; 3% instances), cs-dep/nmod (28; 3% instances), cs-dep/amod (26; 3% instances), cs-dep/appos (26; 3% instances), cs-dep/cop (17; 2% instances), cs-dep/dep (12; 1% instances), cs-dep/nsubj (12; 1% instances), cs-dep/xcomp (4; 0% instances), cs-dep/det:nummod (2; 0% instances), cs-dep/neg (1; 0% instances)

Children of DET nodes belong to 13 different parts of speech: ADV (175; 18% instances), PUNCT (130; 13% instances), CONJ (129; 13% instances), VERB (114; 12% instances), ADP (92; 10% instances), ADJ (88; 9% instances), NOUN (73; 8% instances), PRON (67; 7% instances), PART (46; 5% instances), SCONJ (30; 3% instances), PROPN (12; 1% instances), NUM (4; 0% instances), DET (3; 0% instances)


DET in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]