This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home 2015/uppsala issue tracker

Uppsala Group on Determiners and Pronouns

(Dan Zeman, Jenna Kanerva, Mojgan Seraji, Nizar Habash, Petya Osenova, Simonetta Montemagni, Teresa Lynn)

This topic is related to the Github issues #159, #157, #154 and #132.

The issues discussed in this group are extremely complex and we could not hope for an ultimate solution to be found within 90 minutes. We started collecting the input and we are going to go on collecting it online. Then hopefully there will be more insight and we can find a way of modifying the current guidelines.

The starting point

There are languages that sort of lack the category of determiners. What we mean by that is really just the category, not the words themselves. There are words that are very well comparable to what is called determiners in other languages, they are just not called determiners in the “traditional” grammar of the language. Instead, they are often called pronouns.

In order to increase cross-lingual comparability, it is desirable to use the same labels for these words across langauges. Consequently, we have to partially abandon the traditional grammars and to define determiners in these languages. (Not necessarily in all languages. But when we have words that are parallel to English or Romance determiners, we want parallel analysis for them.)

The current borderline between u-pos/PRON and u-pos/DET in UD, simplified, says that if it replaces a noun phrase, it is a pronoun; if it modifies a noun phrase, it is a determiner. This definition dates back at least to the EAGLES multi-language annotation project in the 1990s. The basic idea here is that pronouns share properties with nouns, and analogically, determiners share properties with adjectives. Examples:

Thus the borderline is defined functionally. It means that context matters: we classify these words according to how they are used rather than what they are. One reason is that we cannot easily tell “what the words are”. The existing taggers and tagsets are not going to help us because they do not distinguish determiners. On the other hand, this functional approach contrasts with what we do elsewhere in UD. For instance, we say that prepositions remain tagged u-pos/ADP even if they are used as verbal particles (cf. English come on), which is a usage quite different from the prototypical function of prepositions.

There is another and perhaps more important objection to the current definition: in languages that traditionally do distinguish determiners, our definition does not precisely match the borderline already established by their tagsets. Conforming to the UD guidelines thus means that in these languages many words must be fixed too.

Finally, while the definition may seem robust at the first glance, its applicability is also limited. A pronoun might be replacing a noun phrase but modifying another noun phrase at the same time (as a genitive post-modifier). If a word does not modify a noun, it could mean that it is a pronoun, but it could also be a determiner whose noun head has been elided. So the definition does not cover all possible situations and we need either more freedom, or more elaborate guidelines.

Example of ellipsis in [cs]:

What are the options?

Note that the functional definition is the only one which might guarantee comparability and consistency across languages. However, if this option is selected there are other fuzzy distinctions - e.g. that between nouns and adjectives acting as nouns (as in the old and the young), or adjectives and verbs acting as adjectives (as in written text or smiling person) in specific constructions - which should be dealt with similarly: so, whatever decision is taken for dealing with the det/pron distinction, this might require a revision of the treatment of these other categories across the different languages.

Interaction between POS tags and dependency relations

The current UD guidelines almost imply that the u-pos/DET POS tag and the u-dep/det relation label occur at the same places. Functional multi-word expressions are an exception. A determiner inside of a MWE will be attached to the previous token with the label u-dep/mwe. If it happens to be the first token of the MWE, and the whole MWE behaves like something else than a determiner, then it will also have a different label.

This gives us a good device to search for irregularities. Even when we ignore MWEs, there are a number of points in the data where the DET <=> det constraint is currently violated.

When det does not imply DET

There are no occurrences in bg, cs, da, fi-ftb. The other languages:

PML-TQ (http://lindat.mff.cuni.cz/services/pmltq/#!/treebanks) was used to collect examples:

a-node $d := [deprel="det", tag!="DET"] >> for $d.tag give $1, count() sort by $2 desc, $1

When DET does not imply det

PML-TQ (http://lindat.mff.cuni.cz/services/pmltq/#!/treebanks) was used to collect examples:

a-node $d := [tag="DET", deprel!="det"] >> for $d.tag give $1, count() sort by $2 desc, $1

There are no occurrences in: hr, da, fi, fi-ftb, el.

Miscellaneous

We are discussing how different languages encode “determination”.

More than one determiner per NP? There are currently no restrictions but grammars of some languages assume at most one determiner per noun phrase. This is probably why we have det:predet in English and Italian, to mark that the additional determiner is exceptional.

Pronouns, determiners and pronominal adverbs should always have a non-empty value of the feature u-feat/PronType. In particular, articles should be tagged PronType=Art.

The big table

We thought it would be useful to get a broad picture of pronominal words in various languages, how they behave and how they are usually classified in grammars of those languages. It is a space of several dimensions and it is not clear what would be the best way of visualizing it but let’s start with a table and see what we get.

Legend: TPOS = traditional part of speech, i.e. what category it belongs to in the grammatical tradition used in this language. OPOS = part of speech coming from the original / native tagset (but translated to universal POS tags, if possible); this is likely, but not necessary, to be same as TPOS. Similar = what non-pronominal part of speech (if any) does this word resemble? For a concrete example, if the language has genders and the word takes different forms for different genders in order to agree with a modified noun, it is probably like an adjective. Amod = is it possible or even likely that it modifies a noun in a similar way to how adjectives modify nouns?

Lang Word Gloss TPOS OPOS Similar Amod Note
en the, a, an DET ADJ Mandatory Articles.
de der, die, das, ein, eine the, the, the, a, a DET ADJ Mandatory Articles. The indefinite article ein is homonymous with the numeral “one” but they have different tags.
bg един, една, едно, едни PRON ADJ Mandatory Articles. The indefinite article един is homonymous with the numeral “one” but they have different tags.
en I, you, he, she, it, we, they, one, myself, yourself, himself, herself, itself, ourselves, yourselves, themselves, oneself PRON PRON NOUN Impossible Personal pronouns have two cases (direct/nominative, and oblique/accusative). We do not count English possessive pronouns as the genitive case of personal pronouns.
de ich, du, er, sie, es, wir, ihr, sie, man, einander I, you (Sing), he, she, it, we, you (Plur), they, one, each other PRON PRON NOUN Impossible Pronouns inflect for case but out of the four German cases for nouns, only three are used for pronouns. Personal pronouns do not have a genitive form and possessive pronouns (with adjective-like forms) are used instead.
cs já, ty, on, ona, ono, my, vy, oni, ony, ona, se I, you (Sing), he, she, it, we, you (Plur), they (Masc), they (Fem), they (Neut), oneself (Reflex) PRON PRON NOUN Impossible Pronouns inflect for case (7 different cases in Czech) but regardless the case, personal (non-possessive) pronouns are never determiners. Note that Czech allows that noun phrases are post-modified by genitive noun phrases, this construction is one of the possible means to express possession, but the genitive noun phrase cannot be a genitive personal pronoun. A possessive pronoun must be used instead.
bg аз, ти, той, тя, то, ние, вие, те, себе си, се, на себе си, си I, you (Sing), he, she, it, we, you (Plur), they (Plur), oneself (Reflex) PRON PRON NOUN Impossible Only 3rd singular pronouns have gender; other inflect in number and case (non-reflexive have nominative, accusative and dative, while reflexive have only accusative and dative).
en my, your, his, her, its, our, their PRON PRON ADJ Mandatory Possessive pronouns (adjectival forms).
en mine, yours, his, hers, its, ours, theirs PRON PRON NOUN Impossible Possessive pronouns (standalone forms). (Or should we say that this is the genitive of personal pronouns?)
de mein, dein, sein, ihr, unser, euer my, your (Sing), his / its, her / their, our, your (Plur) PRON DET ADJ Likely Possessive pronouns.
cs můj, tvůj, jeho, její, náš, váš, jejich, svůj my, your (Sing), his/its, her, our, your (Plur), their, oneself's (Reflex) PRON PRON ADJ Likely These are not genitive forms of personal pronouns! (They exist but they are different.) These are nominative forms of possessive pronouns, which behave like adjectives. They have different forms for different genders; one must use the form that agrees with the modified (possessed) noun in gender, number and case.
bg мой, твой, негов, неин, наш, ваш, техен, свой my, your (Sing), his/its, her, our, your (Plur), their, oneself's (Reflex) PRON PRON ADJ Likely The possessive pronouns can take definite article (моя(т), твоя(т) etc.). These forms inflect in gender and in number. They have also clitic counterparts -- ми, ти, му, й, ни, ви, им.
mt tiegħi, tiegħek, tiegħu, tagħha, tagħna, tagħkom, tagħhom my, your (Sing), his, her, our, your (Plur), their PRON PRON ADP+NOUN Unlikely These words originated as combinations of the preposition ta' “of” and personal pronoun suffixes. Thus they literally correspond to “of me, of you” etc. Their only similarity to adjectives is that they are also placed after the noun they modify (the possessed noun), but this may be a pure coincidence. Unlike Maltese adjectives, the possessive pronouns do not agree with the possessed noun in gender. Besides these possessive pronouns, Maltese can also express possession by inflection of the possessed noun. That actually means that the personal pronoun suffixes are attached directly to the noun: dar “house”, dari “my house”, darek “your (Sing) house”, daru “his house”, darha “her house”, darna “our house”, darkom “your (Plur) house”, darhom “their house”.
en this, that DET ADJ Possible The that that works as subordinating conjunction is considered a different word, homonymous with the determiner.
de dies this PRON PRON NOUN Impossible Lemma is dieser and there are also adjectival forms that work like determiners.
de dieser, jener, solcher, derselbe this, that, such, the same PRON DET ADJ Likely For dieser there is also the substantive form dies that is used as a standalone noun phrase. It is described in a separate row of this table.
de derjenige the one PRON PRON ADJ Possible Morphology is adjectival (gender and number inflection) but it usually appears without a parent noun. Instead, it is itself modified by a relative clause, e.g. Derjenige, der den Pythagoras nicht kapiert… “The one who does not understand Pythagoras…”
cs ten (to), tento, tenhle, tamten, onen, takový, týž, tentýž the (it), this, this, that, that, such, same, same PRON PRON ADJ Likely Demonstratives mostly inflect for gender and modify nouns adjectively. Only the neuter gender of a subset of these words (to, toto, tohle, tamto) can be used alone as a true pronoun (it, this, that).
bg този, тази, това, тези, онзи, онази, онова, онези this.MASC, this.FEM, this.NEUT, these.PL, that.MASC, that.FEM, that.NEUT, those.PL PRON PRON ADJ Likely Demonstratives inflect for gender in singular, and number. The neuter forms easily substantivize.
en who, what, whoever, whatever PRON NOUN Impossible What can also work as a determiner (what an opportunity; what mosques) similar to which. That is considered a different homonymous word and gets a different tag even in the original tagset (see below).
de wer, was who, what PRON PRON NOUN Impossible Interrogative and relative pronouns.
cs kdo, co who, what PRON PRON NOUN Impossible
en which, what, whatever DET ADJ Likely What can also work as a pronoun. That is considered a different homonymous word and gets a different tag even in the original tagset (see above).
de welcher, der which, that PRON PRON ADJ Possible Interrogative and relative pronouns. The relative pronoun der is homonymous with the definite article but they have different tags.
en whose PRON ADJ Likely In the Penn Treebank, it is tagged WP$, which means interrogative / relative possessive pronoun.
de wessen, dessen whose PRON PRON ADJ Likely Interrogative / relative possessive pronouns.
cs jaký, který, čí, jenž which (quality), which (selection), whose, that (Rel) PRON PRON ADJ Possible
en somebody, something, anybody, anything, everybody, everything, nobody, nothing NOUN NOUN NOUN Impossible
de jemand, etwas, niemand, nichts somebody, something, nobody, nothing NOUN NOUN NOUN Impossible
cs někdo, něco, kdokoli, cokoli, nikdo, nic somebody, something, anybody, anything, nobody, nothing PRON PRON NOUN Impossible
en some, any, every, each, all, no, another, both, such, either, neither DET ADJ Likely
de irgendeiner, irgendwelcher, mancher, anderer, jeder, alle, beide, keiner some (quality), some (selection), some (selection), other, every, all, both, no DET ADJ Likely Anderer “other” is sometimes tagged as adjective, sometimes as substantive pronoun and sometimes as attributive pronoun (determiner).
cs nějaký, některý, něčí, jakýkoli, kterýkoli, číkoli, každý, nijaký, žádný, ničí some (quality), some (selection), someone's, any (quality), any (selection), anyone's, every / each, no such (quality), no (selection), no one's PRON PRON ADJ Likely
cs všechen all / everybody / everything PRON PRON ADJ Unlikely This word can be used as a determiner but most of the time it is used as a standalone pronoun that would be translated as “everyone, everything”.
de wieviel how much PRON / DET ADJ Possible
cs kolik how many / how much NUM NUM NUM > 4 See note This word may actually govern the counted noun by dictating that it be in genitive. In other situations it agrees with the noun in case. This is a morpho-syntactic behavior parallel to higher-value cardinal numerals, but definitely not to Czech adjectives. Nevertheless, we have been treating this word as a language-specific subtype of determiner, to be parallel with English many. It may also occur without the counted noun but one could argue that it is ellipsis.
de soviel so much PRON / DET ADJ Possible
cs tolik that many / that much NUM NUM NUM > 4 See note This word may actually govern the counted noun by dictating that it be in genitive. In other situations it agrees with the noun in case. This is a morpho-syntactic behavior parallel to higher-value cardinal numerals, but definitely not to Czech adjectives. Nevertheless, we have been treating this word as a language-specific subtype of determiner, to be parallel with English many. It may also occur without the counted noun but one could argue that it is ellipsis.
en many, few, little, much, several, more, most, less, least ADJ / ADV ADJ / ADV ADJ / ADV Likely The words much, more, most and little, less, least can also function as adverbs and if they do, they are also tagged so in the original tagset.
de einiger, vieler, weniger, meister some (quantity or selection), many, few, most DET ADJ Possible
de bißchen, viel, wenig, mehr bit, much, little, more PRON NUM? Possible These indefinite quantifiers modify a quantified uncountable noun (possibly elided) without taking an adjective-like suffix for gender/number agreement. This makes their behavior similar to cardinal numbers, but they are not used with countable nouns (with countable nouns the adjectival suffixes would be needed).
cs několik some (quantity) / a few / several NUM NUM NUM > 4 See note This word may actually govern the counted noun by dictating that it be in genitive. In other situations it agrees with the noun in case. This is a morpho-syntactic behavior parallel to higher-value cardinal numerals, but definitely not to Czech adjectives. Nevertheless, we have been treating this word as a language-specific subtype of determiner, to be parallel with the treatment of cardinal numerals and English English determiners. It may also occur without the counted noun but one could argue that it is ellipsis.
cs mnoho, málo, hodně, více, méně many / much, few / little, many / much, more, less / fewer NUM / ADV NUM / ADV NUM > 4 See note These words share properties of the indefinite numerals introduced above, including the ambivalent relation to counted nouns. But they are also similar to adverbs, in that they can be compared. And they are also used as either numerals (quantity) or adverbs (when they modify adjectives, adverbs or verbs).
cs kolikátý what rank NUM NUM ADJ Likely Interrogative / relative ordinal numeral. Kolikáté pivo máš? means “How many beers have you had?” but literally it is something like “How-many-th beer do-you-have?”
cs tolikátý that rank NUM NUM ADJ Likely Demonstrative ordinal numeral.
cs několikátý some rank / umpteenth NUM NUM ADJ Likely Indefinite ordinal numeral.
cs kolikrát how many times NUM NUM ADV Impossible Interrogative / relative multiplicative numeral.
cs tolikrát so many times NUM NUM ADV Impossible Demonstrative multiplicative numeral.
cs několikrát several times NUM NUM ADV Impossible Indefinite multiplicative numeral.
cs pokolikáté after how many times NUM NUM ADV Impossible Interrogative / relative multiplicative-ordinal numeral. Pokolikáté už se to stalo? “How many times has this happenned?” lit. approx. “How-many-th-time already itself this happenned?”
cs potolikáté after how many times NUM NUM ADV Impossible Demonstrative multiplicative-ordinal numeral.
cs poněkolikáté after several times NUM NUM ADV Impossible Indefinite multiplicative-ordinal numeral.
en where, when, how, why, wherever, whenever ADV ADV ADV Impossible Interrogative / relative adverb.
de wo, wohin, woher, wann, wie, warum, weshalb, wonach, wobei, womit, wozu, wofür, wodurch, woran, worüber, worin, wogegen, worauf, woraus, worum, wohinein, woraufhin, wovor where, where to, where from, when, how, why, hence, after which, by which, with which, to which, for which, through which, on which, over which, in which, against which, on which, from which, about which, into which, onto which, before which ADV ADV ADV Impossible Interrogative / relative adverb.
cs kde, kam, odkud, kudy, kdy, odkdy, dokdy, jak, proč where, where to, where from, where through, when, since when, until when, how, why ADV ADV ADV Impossible Interrogative / relative adverb.
en here, there, now, then, so ADV ADV ADV Impossible Demonstrative adverb.
de da, dahin, daher, hier, dort, dorthin, jetzt, dann, so, darum, deshalb, damit, dabei, dafür, dazu, davon, darauf, dagegen, darüber, daran, zudem, darin, außerdem, danach, darunter, dadurch, daraus, trotzdem, davor, deswegen, demnach, daraufhin, seitdem, dahinter, hierzu, daneben, … here, here (to), from here, here, there, there (to), now, then, so, that's why, thus / therefore, with that, by that, for that, to that, from that, on that, against that, over that, on that, to that, in that, except that, after that, under that, through that, from that, though, before that, therefore, thus, then, since then, behind that, to this, besides that, … ADV ADV ADV Impossible Demonstrative adverb.
cs tady, tam, sem, odsud, odtamtud, tudy, tadytudy, tamtudy, teď, tehdy, potom, tentokrát, tak, proto here, there, here (to), from here, from there, through here, through here, through there, now, then, then, this time, so, because of that ADV ADV ADV Impossible Demonstrative adverb.
en somewhere, sometimes, somewhat, somehow, anywhere, anytime, anyhow, anyway, anyways, everywhere, always ADV ADV ADV Impossible Indefinite adverb.
de irgendwo, irgendwann, irgendwie, überall, immer, jederzeit, manchmal somewhere, sometime, somehow, everywhere, always, anytime, sometimes ADV ADV ADV Impossible Indefinite adverb.
cs někde, někam, odněkud, někudy, někdy, nějak, kdekoli, kamkoli, odkudkoli, kudykoli, kdykoli, jakkoli, všude, všudy, vždy somewhere, to somewhere, from somewhere, through somewhere, sometime(s), somehow, anywhere, to anywhere, from anywhere, through anywhere, anytime, anyhow, everywhere, through everywhere, always ADV ADV ADV Impossible Indefinite adverb.
en nowhere, never, nohow ADV ADV ADV Impossible Negative adverb.
de nirgendwo, nirgends, nie(mals), keineswegs nowhere, nowhere, never, in no way ADV ADV ADV Impossible Negative adverb.
cs nikde, nikam, odnikud, nikudy, nikdy, nijak nowhere, to nowhere, from nowhere, through nowhere, never, nohow ADV ADV ADV Impossible Negative adverb.