POS tags
Open class words | Closed class words | Other |
---|---|---|
ADJ | ADP | PUNCT |
ADV | AUX | SYM |
INTJ | CONJ | X |
NOUN | DET | |
PROPN | NUM | |
VERB | PART | |
PRON | ||
SCONJ |
ADJ
: adjective
This document is a placeholder for the language-specific documentation
for ADJ
.
ADP
: adposition
This document is a placeholder for the language-specific documentation
for ADP
.
ADV
: adverb
This document is a placeholder for the language-specific documentation
for ADV
.
AUX
: auxiliary verb
This document is a placeholder for the language-specific documentation
for AUX
.
CONJ
: coordinating conjunction
This document is a placeholder for the language-specific documentation
for CONJ
.
DET
: determiner
Traditional grammars of Slavic languages do not distinguish pronouns (PRON
) from pro-adjectives (determiners, DET
),
hence it is important to define a consistent borderline here. (Some authors, e.g. Sussex and Cubberley (2006) do use
the term determiner in Slavic languages but they rely on common understanding without precisely delimiting them.)
In order to provide the broader picture, we describe both pronouns and determiners on one page: sla-pos/PRON.
INTJ
: interjection
This document is a placeholder for the language-specific documentation
for INTJ
.
NOUN
: noun
This document is a placeholder for the language-specific documentation
for NOUN
.
NUM
: numeral
This document is a placeholder for the language-specific documentation
for NUM
.
PART
: particle
This document is a placeholder for the language-specific documentation
for PART
.
PRON
: pronoun
Traditional grammars of Slavic languages do not distinguish pronouns from pro-adjectives (determiners, sla-pos/DET), hence it is important to define a consistent borderline here. (Some authors, e.g. Sussex and Cubberley (2006) do use the term determiner in Slavic languages but they rely on common understanding without precisely delimiting them.)
In order to provide the broader picture, we describe both pronouns and determiners here; the page sla-pos/DET is empty.
Personal pronouns
Non-possessive personal pronouns are tagged PRON PronType=Prs
(see also the sla-feat/PronType feature).
Third-person pronouns are formed as inflections on one stem and should have one lemma, the masculine singular nominative form.
(In fact there are two stems: one for the nominative and the other for the remaining cases. But the point is that the stems
do not change with gender or number.)
The first and second person pronouns are formed from different stems in singular and plural. However, to be consistent,
the singular nominative form should be used as lemma for both (all) numbers in the given person. Thus in [cs], the lemma
of my is já and the lemma of vy is ty.
Reflexive pronouns have their own lemma (one lemma for both clitic and non-clitic forms).
Since they lack the nominative form, the lemma should be the clitic accusative form, which is arguably the most frequent one.
List of nominative forms of personal pronouns (accusative for reflexives) in various languages:
- [cs] já, ty, on, ona, ono, my, vy, oni, ony, ona, se
- [sk] ja, ty, on, ona, ono, my, vy, oni, ony, sa
- [hsb] ja, ty, wón, wona, wono/wone, mój, wój, wonaj, wonej, my, wy, woni, wone, so
- [pl] ja, ty, on, ona, ono, my, wy, oni, one, się
- [ru] я, ты, он, она, оно, мы, вы, они, ся
- [sl] jaz, ti, on, ona, ono, midva, vidva, onadva, mi, vi, oni, one, ona, se
- [hr] ja, ti, on, ona, ono, mi, vi, oni, one, ona, se
- [bg] аз, ти, той, тя, то, ние, вие, те, се
- [cu] азъ, тꙑ, мꙑ, вꙑ, и, сѧ
Possessive pronouns
The words that are traditionally called possessive pronouns are in fact possessive determiners and should be tagged
DET Poss=Yes | PronType=Prs
. First and second person possessives, and the reflexive possessive, function like adjectives.
They precede the modified (possessed) noun and concord with it in gender, number and case.
In the South Slavic languages the same can be said also about third person possessives.
In the north, third person possessives evolved from (or are still identical to) the genitive form of the personal pronoun,
and they do not inflect.
However, they are traditionally distinguished from the personal pronoun, they are placed before the possessed noun
(unlike nominal genitive modifiers), and for consistency we tag them DET
as well.
Similar to adjectives, one lemma covers all inflections for gender, number and case, governed by the modified noun.
Inherent gender, number and person of the possessor do not play a role, i.e. můj “my” and náš “our” are two distinct lemmas.
- [cs] můj, tvůj, jeho, její, náš, váš, jejich, svůj
- [sk] môj, tvoj, jeho, jej, náš, váš, ich, svoj
- [hsb] mój, twój, jeho, jeje, naš, waš, jich, swój
- [pl] mój, twój, jego, jej, nasz, wasz, ich, swój
- [ru] мой, твой, его, её, наш, ваш, их, свой
- [sl] moj, tvoj, njegov, njen, najin, vajin, njun, naš, vaš, njihov, svoj
- [hr] moj, tvoj, njegov, njezin/njen, naš, vaš, njihov, svoj
- [bg] мой, твой, негов, неин, наш, ваш, техен, свой
- [cu] мои, твои, его, еѩ, ею, нашь, вашь, ихъ, свои
Demonstratives
All demonstrative “pronouns” inflect for gender and can modify nouns, which places them in the DET
category.
If the noun phrase is missing, it can be explained by ellipsis, at least for the masculine and feminine forms.
Certain neuter singular forms ([cs] to, toto, tohle, tamto)
are also frequently used to refer to unspecified or general entities, that is, they are used in these situations more
like pronouns than like determiners.
There are two possible solutions:
- Tag all demonstratives
DET PronType=Dem
. The lemma is always masculine singular nominative. - As 1., with the exception that selected neuter singular forms are ambiguous and may also appear as
PRON PronType=Dem
. Then the lemma is neuter singular nominative. Disambiguation has to be done by context: if it pre-modifies a noun phrase and concords with it in gender, number and case, it is determiner; otherwise it is pronoun.
- [cs] ten, tento, tenhle, tamten, onen, takový, týž, tentýž
- [sk] ten, tento, tamten, onen, taký, takýto
- [hsb] tón, tutón, tónle, tamny, wony, tajki
- [pl] ten, tamten, taki
- [ru] этот, тот, такой
- [sl] tisti, ta, oni, takšen
- [hr] taj, ovaj, onaj, takav
- [bg] този, онзи, такъв
- [cu] тъ, онъ, такъ, таковъ
Pronouns derived from “who, what”
These are always PRON
and never DET
. They fall into various pronominal types:
interrogatives, relatives (“who, what”), indefinites (“somebody, something, anybody, anything”) and negatives (“nobody, nothing”).
They inflect for case but not for gender and number;
“who” functions as singular masculine,
“what” as singular neuter.
Note: Bulgarian is exception. Instead of the kt/čt roots found in the other languages, Bulgarian uses кой / koj, which inflects for gender and number like adjectives, and while it predominantly occurs in pronoun position, it can be used as determiner too: Кой текст четете? / Koj tekst četete? “What text are you reading?” The substantive root survives in Bulgarian нещо / nešto “something” and нищо / ništo “nothing”; even relative що / što “what” exists but it is very rare.
- [cs] kdo, co, což, někdo, něco, kdokoli, cokoli, nikdo, nic
- [sk] kto, čo, niekto, niečo, nikto, nič
- [hsb] štó, što, štóž, štož, kiž, něchtó, něšto, něchtóžkuli, něštožkuli, nichtó, ničo
- [pl] kto, co, ktoś, coś, nikt, nic
- [ru] кто, что, кто-нибудь, что-нибудь, никто, ничто
- [sl] kdo, kaj, nekdo, nekaj, nihče, nič
- [hr] tko, što, neki, nešto, nitko, ništa
- [bg] що, нещо, нищо
- [cu] къто, чьто, нѣкъто, нѣчьто, никътоже, ничьтоже
Determiners derived from “which, whose”, total and other determiners
In some Slavic languages there are two interrogative pronouns/determiners corresponding to [en] “which”:
one that represents a selection, “which one” ([cs] který);
and one that queries a quality, “what kind of” ([cs] jaký).
Both can be used as relative pronouns/determiners, too.
Their inflection is fully adjectival, therefore they should be tagged DET
,
despite the fact that when they are used as relative determiners,
the modified noun is not there and its absence cannot be explained by ellipsis
(but it is the noun modified by the entire relative clause).
Bulgarian кой / koj etymologically corresponds to the “what kind of” determiner in other languages. As noted above, it can be used as a determiner but it is much more likely to replace a noun phrase, i.e. to be used as a pronoun. It seems to be a good candidate to allow both tags and disambiguate by context.
In addition, there is a possessive interrogative determiner corresponding to [en] “whose” ([cs] čí).
There are also derived indefinite and negative determiners, using the same affixes as with “who, what”; only the negative determiner “no” corresponding to “which” contains a different stem ([cs] žádný).
We also include the total determiner “every” ([cs] každý) here, although it is quite frequently used without the modified noun, with the meaning “everybody, everyone”; the decisive factor here is its undoubtedly adjectival inflection. In contrast, we do not include the total pronoun “all / everything” ([cs] všichni / všechno), see below.
- [cs] který, jaký, čí, některý, nějaký, něčí, kterýkoli, jakýkoli, každý, nijaký, ničí, žádný
- [sk] ktorý, aký, čí, niektorý, nejaký, niečí, ktorýkoľvek, akýkoľvek, každý, nijaký, ničí, žiaden/žiadny
- [hsb] kotry, kajki, čeji, kotryž, kajkiž, čejiž, někotry, někajki, něčeji, někotryžkuli, někajkižkuli, něčejižkuli, kóždy, ničeji, žadyn
- [pl] który, jaki, któryś, jakiś, każdy, żaden
- [ru] который, какой, чей, некоторый, который-нибудь, какой-нибудь, чей-нибудь, каждый, никакой, ничей
- [sl] kateri, kak, čigav, kakršen, nekateri, nek, nekakšen, nikakršen, noben
- [hr] koji, kakav, čiji, nekakav, nikakav
- [bg] кой, какъв, чий, който, какъвто, чийто, някой, някакъв, нечий, никой, никакъв
- [cu] которꙑи, кꙑи, каковъ, чии, нѣкꙑи, никоторꙑиже, никꙑиже
Note: In [sl], the pronoun kar corresponds to [cs] který. Its inflection is not adjectival
(the treebank contains only four forms: kar (Nom
, Acc
), česar (Gen
), čemer (Loc
) and čimer (Ins
)),
hence it is pronoun and not determiner.
All, everything
The total pronouns with the root vs/vš are another problematic group.
In Czech, všechen can be used adjectively and has forms for different genders and numbers but usually only a subset of the forms is used, and quite often they are used without a modified noun:
- všichni (masculine animate plural), meaning “all, everybody,” may include non-masculine referents
- všechno (neuter singular), meaning “everything”
The plural forms can be used as determiners, including forms of other genders, if it is known that the group of referents has only that gender: všichni lidé “all people”, všechny domy “all houses” (masculine inanimate), všechny ženy “all women” (feminine), všechna ujednání “all provisions” (neuter). Much more rarely, even singular forms can be used, in the sense “all / entire”.
In UD Czech, 717 instances may be determiners (the heuristic we use: they must agree with their parent in gender and case, and they must not be labeled as subjects—which would mean that the parent is a non-verbal predicate). In addition there are 113 instances of two-pronoun expressions like to všechno “all this” and kdo/co všechno “who/what all”, where one may argue for a determiner analysis as well. This contrasts with the total number of occurrences of lemma všechen, 2520. If we limit the search to singular neuters, and exclude the two-pronoun expressions, there are only 8 instances where všechno is a determiner, mostly with mass nouns (všechno světlo “all light”); this contrasts with the total number of singual neuter occurrences, 600.
To summarize, we may want to grant this lemma special treatment. At least the singular neuter form, všechno, would
deserve the PRON
tag, unless it occurs in a clearly attributive position.
It remains to be determined how the cognate words in the other Slavic languages behave.
- [cs] všechen, všecek, všichni, všechno, všecko, vše
- [sk] lemma všetok, most frequent forms všetko, všetci, other forms všetkých, všetkým, všetky, všetkými, všetkého, všetkému, všetku
- [hsb] wšě, wšěch, wšěm, wšykne?, wšeho, wšemu, wšitke, wšitkim, wšitkich (DET; only wšitkim was used without a modified noun), wšitko (PRON)
- [pl] lemma wszystko, wszyscy (PRON); lemma wszystek (DET), forms wszystkie, wszystkich, wszyscy, wsze (but the pronoun-determiner distinction is probably caused by the conversion procedure)
Pronominal quantifiers
Terminological note: For the purpose of this chapter, the term quantifier does not include words with adjectival declension, even if their meaning has to do with quantity ([cs] každý, mnohý, nejeden, žádný). We now focus on words that resemble high-value numerals (5 and above) or nouns like group, batch and combine with a quantified noun in the genitive.
All pronominal quantifiers are tagged DET
.
They are morphologically and syntactically different from adjectives and other determiners.
They are much closer to cardinal numerals but they cannot get the NUM
tag, which is reserved for definite quantities.
Note that the meaning of [pl] tylko has shifted towards “only”, which makes it an adverb rather than a demonstrative quantifier. A similar shift may have happened in some of the other languages, too. The interrogative kolik may be used as relative, except in [hsb] and [bg]. Occasionally it may be also used as indefinite ([pl] kilka).
- [cs] kolik, tolik, několik
- [sk] koľko, toľko, niekoľko
- [hsb] kelko, kelkož, telko
- [pl] kilka
- [ru] сколько, столько, несколько
- [sl] koliko
- [hr] koliko
- [bg] колко, колкото
- [cu] колико
Indefinite quantifiers and adverbs of degree
There is a relatively small group of words that lie on the borderland between adverbs, numerals
and pronouns/determiners: [cs] mnoho “many”, hodně “much”, málo “little, few”. They may denote the degree of
an adjective or verb, and they can be compared: více “more”, nejvíce “most”, méně “less, fewer”, nejméně “least, fewest”.
These are typical properties of adverbs.
However, they can also denote an indefinite quantity when they take a genitive nominal argument
(plural for countable nouns, or singular in the partitive sense).
This follows the typical behavior of numerals. The whole phrase (numeral + noun) works like a noun phrase, can become
argument of a verb and some of the numerals even inflect for case: s mnoha body “with many points” (Case=Ins
).
When it acts as subject, it is regarded as neuter singular for the purpose of subject-verb agreement.
[cs]
Trenér sázel mnohem více na herní stránku než na kondici . \n Coach bet much more on game aspect than on physical-condition .
advmod(více, mnohem)
advmod(sázel, více)
dobj(sázel, stránku)
nmod(více, kondici)
As adverb, více is the comparative form of lemma hodně (but it could as well be assigned the lemma mnoho; the comparative form is irregular, without direct morphological relation to the basic positive form. As indefinite numeral, it is its own lemma (but there are only two occurrences in UD Czech).
Bude vybráno více zájemců . \n Will-be selected more applicants .
nsubjpass(vybráno, zájemců)
det:numgov(zájemců, více)
The two syntactic functions are not compatible.
The words in this group should thus receive two different tags, disambiguated by context.
When they denote quantity, their tag will be DET NumType=Card | PronType=Ind
.
When they denote degree, their tag will be ADV
.
- [cs] mnoho, moc, nemálo, dost, příliš, hodně, více, nejvíce, málo, nemnoho, méně, nejméně
- [sk] veľa, viac, najviac, málo, menej, najmenej
- [hsb] mnoho, wjele, wjac, wjace, najwjace, najbóle, dosć, mało, mjenje, najmjenje
- [pl] dużo, wiele, więcej, najwięcej, mało, mniej, najmniej
- [ru] много, немало, больше, более, наиболее, достаточно, мало, немного, меньше, менее, наименее
- [sl] mnogo, veliko, več, največ, malo, manj, najmanj, zelo, bolj, najbolj, dosti
- [hr] mnogo, više, najviše, malo, manje, najmanje, vrlo, dosta
- [bg] много, повече, най-вече
[sl]
Kolesca morajo biti mnogo večja od tistih \n Wheels must be much larger from those
advmod(večja, mnogo)
nmod(večja, tistih)
case(tistih, od)
skozi mnogo let \n over many years
case(let, skozi)
det(let, mnogo)
Slovenija potrebuje več urejenih informacij na internetu \n Slovenia needs more orderly informations on internet
advmod(urejenih, več)
amod(informacij, urejenih)
The above sentence seems ambiguous. Več is annotated as an adverb modifying the adjective urejenih, but it could also be a quantifier for the whole phrase urejenih informacij.
Domovanja so raztresena na več kilometrih \n Dwellings are scattered on more kilometers
det(kilometrih, več)
case(kilometrih, na)
nmod(raztresena, kilometrih)
Partitive usage:
Imeli več časa za priprave \n They-had more time for preparations
det(časa, več)
dobj(Imeli, časa)
ki so terjali življenja več kot sto civilistov \n that have lost lives more than hundred civilians
mwe(več, kot)
det(civilistov, več)
nummod(civilistov, sto)
The above annotation is taken from UD Slovenian 1.3 but I think that več kot should be attached to sto and the relation should be sla-dep/advmod.
References
- Roland Sussex, Paul Cubberley. 2006. The Slavic Languages. Cambridge: Cambridge University Press.
PROPN
: proper noun
This document is a placeholder for the language-specific documentation
for PROPN
.
PUNCT
: punctuation
This document is a placeholder for the language-specific documentation
for PUNCT
.
SCONJ
: subordinating conjunction
This document is a placeholder for the language-specific documentation
for SCONJ
.
SYM
: symbol
This document is a placeholder for the language-specific documentation
for SYM
.
VERB
: verb
This document is a placeholder for the language-specific documentation
for VERB
.
X
: other
This document is a placeholder for the language-specific documentation
for X
.