home edit page issue tracker

This page pertains to UD version 2.

PRON: pronoun

Traditional grammars of Slavic languages do not distinguish pronouns from pro-adjectives (determiners, sla-pos/DET), hence it is important to define a consistent borderline here. (Some authors, e.g. Sussex and Cubberley (2006) do use the term determiner in Slavic languages but they rely on common understanding without precisely delimiting them.)

In order to provide the broader picture, we describe both pronouns and determiners here; the page sla-pos/DET is empty.

Personal pronouns

Non-possessive personal pronouns are tagged PRON PronType=Prs (see also the sla-feat/PronType feature). Third-person pronouns are formed as inflections on one stem and should have one lemma, the masculine singular nominative form. (In fact there are two stems: one for the nominative and the other for the remaining cases. But the point is that the stems do not change with gender or number.) The first and second person pronouns are formed from different stems in singular and plural. However, to be consistent, the singular nominative form should be used as lemma for both (all) numbers in the given person. Thus in [cs], the lemma of my is and the lemma of vy is ty. Reflexive pronouns have their own lemma (one lemma for both clitic and non-clitic forms). Since they lack the nominative form, the lemma should be the clitic accusative form, which is arguably the most frequent one.

List of nominative forms of personal pronouns (accusative for reflexives) in various languages:

Possessives

The words that are traditionally called possessive pronouns are in fact possessive determiners and should be tagged DET Poss=Yes | PronType=Prs. First and second person possessives, and the reflexive possessive, function like adjectives. They precede the modified (possessed) noun and concord with it in gender, number and case. In the South Slavic languages the same can be said also about third person possessives. In the north, third person possessives evolved from (or are still identical to) the genitive form of the personal pronoun, and they do not inflect. However, they are traditionally distinguished from the personal pronoun, they are placed before the possessed noun (unlike nominal genitive modifiers), and for consistency we tag them DET as well. Similar to adjectives, one lemma covers all inflections for gender, number and case, governed by the modified noun. Inherent gender, number and person of the possessor do not play a role, i.e. můj “my” and náš “our” are two distinct lemmas.

Demonstratives

All demonstrative “pronouns” inflect for gender and can modify nouns, which places them in the DET category. If the noun phrase is missing, it can be explained by ellipsis, at least for the masculine and feminine forms. Certain neuter singular forms ([cs] to, toto, tohle, tamto) are also frequently used to refer to unspecified or general entities, that is, they are used in these situations more like pronouns than like determiners.

There are two possible solutions:

  1. Tag all demonstratives DET PronType=Dem. The lemma is always masculine singular nominative.
  2. As 1., with the exception that selected neuter singular forms are ambiguous and may also appear as PRON PronType=Dem. Then the lemma is neuter singular nominative. Disambiguation has to be done by context: if it pre-modifies a noun phrase and concords with it in gender, number and case, it is determiner; otherwise it is pronoun.

Pronouns derived from “who, what”

These are always PRON and never DET. They fall into various pronominal types: interrogatives, relatives (“who, what”), indefinites (“somebody, something, anybody, anything”) and negatives (“nobody, nothing”). They inflect for case but not for gender and number; “who” functions as singular masculine, “what” as singular neuter.

Note: Bulgarian is exception. Instead of the kt/čt roots found in the other languages, Bulgarian uses кой / koj, which inflects for gender and number like adjectives, and while it predominantly occurs in pronoun position, it can be used as determiner too: Кой текст четете? / Koj tekst četete? “What text are you reading?” The substantive root survives in Bulgarian нещо / nešto “something” and нищо / ništo “nothing”; even relative що / što “what” exists but it is very rare.

Determiners derived from “which, whose”, total and other determiners

In some Slavic languages there are two interrogative pronouns/determiners corresponding to [en] “which”: one that represents a selection, “which one” ([cs] který); and one that queries a quality, “what kind of” ([cs] jaký). Both can be used as relative pronouns/determiners, too. Their inflection is fully adjectival, therefore they should be tagged DET, despite the fact that when they are used as relative determiners, the modified noun is not there and its absence cannot be explained by ellipsis (but it is the noun modified by the entire relative clause).

Bulgarian кой / koj etymologically corresponds to the “what kind of” determiner in other languages. As noted above, it can be used as a determiner but it is much more likely to replace a noun phrase, i.e. to be used as a pronoun. It seems to be a good candidate to allow both tags and disambiguate by context.

In addition, there is a possessive interrogative determiner corresponding to [en] “whose” ([cs] čí).

There are also derived indefinite and negative determiners, using the same affixes as with “who, what”; only the negative determiner “no” corresponding to “which” contains a different stem ([cs] žádný).

We also include the total determiner “every” ([cs] každý) here, although it is quite frequently used without the modified noun, with the meaning “everybody, everyone”; the decisive factor here is its undoubtedly adjectival inflection. In contrast, we do not include the total pronoun “all / everything” ([cs] všichni / všechno), see below.

Note: In [sl], the pronoun kar corresponds to [cs] který. Its inflection is not adjectival (the treebank contains only four forms: kar (Nom, Acc), česar (Gen), čemer (Loc) and čimer (Ins)), hence it is pronoun and not determiner.

All, everything

The total pronouns with the root vs/vš/sv are another problematic group, with some parallels to the demonstratives.

In Czech, všechen can be used adjectively and has forms for different genders and numbers but usually only a subset of the forms is used, and quite often they are used without a modified noun:

The plural forms can be used as determiners, including forms of other genders, if it is known that the group of referents has only that gender: všichni lidé “all people”, všechny domy “all houses” (masculine inanimate), všechny ženy “all women” (feminine), všechna ujednání “all provisions” (neuter). Much more rarely, even singular forms can be used, in the sense “all / entire”.

In UD Czech, 717 instances may be determiners (the heuristic we use: they must agree with their parent in gender and case, and they must not be labeled as subjects—which would mean that the parent is a non-verbal predicate). In addition there are 113 instances of two-pronoun expressions like to všechno “all this” and kdo/co všechno “who/what all”, where one may argue for a determiner analysis as well. This contrasts with the total number of occurrences of lemma všechen, 2520. If we limit the search to singular neuters, and exclude the two-pronoun expressions, there are only 8 instances where všechno is a determiner, mostly with mass nouns (všechno světlo “all light”); this contrasts with the total number of singual neuter occurrences, 600.

The evidence here is similar to demonstratives, which in general behave like adjectives, but some neuter singular forms are used to represent general or unspecified entities, hence they are closer to pronouns. The solution should be same for demonstratives and for the equivalents of all, chosing one of the following options:

  1. Tag all occurrences DET PronType=Tot. The lemma is always masculine singular nominative.
  2. As 1., with the exception that selected neuter singular forms are ambiguous and may also appear as PRON PronType=Tot. Then the lemma is neuter singular nominative. Disambiguation has to be done by context: if it pre-modifies a noun phrase and concords with it in gender, number and case, it is determiner; otherwise it is pronoun.

It remains to be determined how the cognate words in the other Slavic languages behave.

Pronominal quantifiers

Terminological note: For the purpose of this chapter, the term quantifier does not include words with adjectival declension, even if their meaning has to do with quantity ([cs] každý, mnohý, nejeden, žádný). We now focus on words that resemble high-value numerals (5 and above) or nouns like group, batch and combine with a quantified noun in the genitive.

All pronominal quantifiers are tagged DET. They are morphologically and syntactically different from adjectives and other determiners. They are much closer to cardinal numerals but they cannot get the NUM tag, which is reserved for definite quantities.

Note that the meaning of [pl] tylko has shifted towards “only”, which makes it an adverb rather than a demonstrative quantifier. In [hr], toliko is used sometimes as quantifier and sometimes as adverb. A similar shift may have happened in some of the other languages, too. The interrogative kolik may be used as relative, except in [hsb] and [bg]. Occasionally it may be also used as indefinite ([pl] kilka).

Indefinite quantifiers and adverbs of degree

There is a relatively small group of words that lie on the borderland between adverbs, numerals and pronouns/determiners: [cs] mnoho “many”, hodně “much”, málo “little, few”. They may denote the degree of an adjective or verb, and they can be compared: více “more”, nejvíce “most”, méně “less, fewer”, nejméně “least, fewest”. These are typical properties of adverbs. However, they can also denote an indefinite quantity when they take a genitive nominal argument (plural for countable nouns, or singular in the partitive sense). This follows the typical behavior of numerals. The whole phrase (numeral + noun) works like a noun phrase, can become argument of a verb and some of the numerals even inflect for case: s mnoha body “with many points” (Case=Ins). When it acts as subject, it is regarded as neuter singular for the purpose of subject-verb agreement.

[cs]

Trenér sázel mnohem více na herní stránku než na kondici . \n Coach bet much more on game aspect than on physical-condition .
advmod(více, mnohem)
advmod(sázel, více)
obj(sázel, stránku)
nmod(více, kondici)

As adverb, více is the comparative form of lemma hodně (but it could as well be assigned the lemma mnoho; the comparative form is irregular, without direct morphological relation to the basic positive form. As indefinite numeral, it is its own lemma (but there are only two occurrences in UD Czech).

Bude vybráno více zájemců . \n Will-be selected more applicants .
nsubj:pass(vybráno, zájemců)
det:numgov(zájemců, více)

The two syntactic functions are not compatible. The words in this group should thus receive two different tags, disambiguated by context. When they denote quantity, their tag will be DET NumType=Card | PronType=Ind. When they denote degree, their tag will be ADV.

[sl]

Kolesca morajo biti mnogo večja od tistih \n Wheels must be much larger from those
advmod(večja, mnogo)
nmod(večja, tistih)
case(tistih, od)
skozi mnogo let \n over many years
case(let, skozi)
det(let, mnogo)
Domovanja so raztresena na več kilometrih \n Dwellings are scattered on more kilometers
det(kilometrih, več)
case(kilometrih, na)
nmod(raztresena, kilometrih)

Partitive usage:

Imeli več časa za priprave \n They-had more time for preparations
det(časa, več)
obj(Imeli, časa)

References


PRON in other languages: [bej] [bg] [bm] [cs] [cy] [da] [de] [el] [en] [es] [ess] [et] [fi] [fro] [fr] [ga] [grc] [hu] [hy] [it] [ja] [ka] [kk] [kpv] [ky] [myv] [no] [pcm] [pt] [qpm] [ru] [sla] [sl] [sv] [tr] [tt] [uk] [u] [urj] [yue] [zh]