home issue tracker

Universal features

For core part-of-speech categories, see the universal POS tags. The features listed here distinguish additional lexical and grammatical properties of words, not covered by the POS tags.

Lexical features
Inflectional features
Nominal Verbal
Gender VerbForm
Animacy Mood
Number Tense
Case Aspect
Definite Voice
Degree Person

Animacy: animacy

Similarly to Gender (and to the African noun classes), animacy is usually a lexical feature of nouns and inflectional feature of other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns. It is independent of gender, therefore it is encoded separately in some tagsets (e.g. all the Multext-East tagsets). On the other hand, in Czech the (almost) only grammatical implications occur within the masculine gender, which is why the PDT tagset does not have animateness as separate feature and instead defines four genders: masculine animate, masculine inanimate, feminine and neuter. We follow the two-feature approach used in Multext-East (many languages) because it is safer.

Polish is special in that it also distinguishes grammatically human vs. non-human animates. It can be demonstrated by inflection of the example word który “which” (boldface forms differ from the middle row):

gender sg-nom sg-gen sg-dat sg-acc sg-ins sg-loc pl-nom pl-gen pl-dat pl-acc pl-ins pl-loc
animate human który którego któremu którego którym którym którzy których którym których którymi których
animate non-human który którego któremu którego którym którym które których którym które którymi których
in-animate który którego któremu który którym którym które których którym które którymi których

Anim: animate

Human beings, animals, fictional characters, names of professions etc. are all animate. Even nouns that are normally inanimate can be inflected as animate if they are personified. For instance, consider a children’s story about cars where cars live and talk as people; then the cars may become and be inflected as animates.

Nhum: animate but non-human

Attested in Polish. In languages where Nhum is used, Anim is restricted to human beings (complement of Nhum).

Inan: inanimate

Nouns that are not animate are inanimate. (If Nhum is used, nouns that are neither Anim nor Nhum are Inan.)

edit Animacy

Aspect: aspect

Aspect is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.

Aspect is a feature that specifies duration of the action in time, whether the action has been completed etc. In some languages (e.g. English), some tenses are actually combinations of tense and aspect. In other languages (e.g. Czech), aspect and tense are separate, although not completely independent of each other.

In Czech and other Slavic languages, aspect is a lexical feature. Pairs of imperfective and perfective verbs exist and are often morphologically related but the space is highly irregular and the verbs are considered to belong to separate lemmas.

Since we proceed bottom-up, the current standard covers only a few aspect values found in corpora. See Wikipedia (http://en.wikipedia.org/wiki/Grammatical_aspect) for a long list of other possible aspects.

Imp: imperfect aspect

The action took / takes / will take some time span and there is no information whether and when it was / will be completed.


Perf: perfect aspect

The action has been / will have been completed. Since there is emphasis on one point on the time scale (the point of completion), this aspect does not work well with the present tense. For example, Czech morphology can create present forms of perfective verbs but these actually have a future meaning.


Pro: prospective aspect

Used in Basque. A combination of tense and aspect that indicates the action is in preparation to take place.

Prog: progressive aspect

English progressive tenses (I am eating, I have been doing …) have this aspect. They are constructed analytically (auxiliary + present participle) but the -ing participle is so bound to progressive meaning that it seems a good idea to annotate it with this feature (we have to distinguish it from the past participle somehow; we may use both the “Tense” and the “Aspect” features).

In languages other than English, the progressive meaning may be expressed by morphemes bound to the main verb, which makes this value even more justified. Example is Turkish.

edit Aspect

Case: case

Case is usually an inflectional feature of nouns and, depending on language, other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns. In some tagsets it is also valency feature of adpositions (saying that the adposition requires its argument to be in that case).

Case helps specify the role of the noun phrase in the sentence, especially in free-word-order languages. For example, the nominative and accusative cases often distinguish subject and object of the verb, while in fixed-word-order languages these functions would be distinguished merely by the positions of the nouns in the sentence.

Here on the level of morphosyntactic features we are dealing with case expressed morphologically, i.e. by bound morphemes (affixes). Note that on a higher level case can be understood more broadly as the role, and it can be also expressed by adding an adposition to the noun. What is expressed by affixes in one language can be expressed using adpositions in another language. Cf. the u-dep/case dependency label.


The descriptions of the individual case values below include semantic hints about the prototypical meaning of the case. Bear in mind that quite often a case will be used for a meaning that is totally unrelated to the meaning mentioned here. Valency of verbs, adpositions and other words will determine that the noun phrase must be in a particular grammatical case to fill a particular valency slot (semantic role). It is much the same as trying to explain the meaning of prepositions: most people would agree that the central meaning of English in is location in space or time but there are phrases where the meaning is less locational: In God we trust. Say it in English.

Note that Indian corpora based on the so-called Paninian model use a related feature called vibhakti. It is a merger of the Case feature described here and of various postpositions. Values of the feature are language-dependent because they are copies of the relevant morphemes (either bound morphemes or postpositions). Vibhakti can be mapped on the Case values described here if we know 1. which source values are bound morphemes (postpositions are separate nodes for us) and 2. what is their meaning. For instance, the genitive case (Gen) in Bengali is marked using the suffix -ra (-র), i.e. vib=era. In Hindi, the suffix has been split off the noun and it is now written as a separate word – the postposition kā/kī/ke (का/की/के). Even if the postpositional phrase can be understood as a genitive noun phrase, the noun is not in genitive. Instead, the postposition requires that it takes one of three case forms that are marked directly on the noun: the oblique case (Acc).

Nom: nominative / direct

The base form of the noun, typically used as citation form (lemma). In many languages this is the word form used for subjects of clauses. If the language has only two cases, which are called “direct” and “oblique”, the direct case will be marked Nom.

Acc: accusative / oblique

Perhaps the second most widely spread morphological case. In many languages this is the word form used for direct objects of verbs. If the language has only two cases, which are called “direct” and “oblique”, the oblique case will be marked Acc.

Abs: absolutive

Some languages (e.g. Basque) do not use nominative-accusative to distinguish subjects and objects. Instead, they use the contrast of absolutive-ergative.

The absolutive case marks subject of intransitive verb and direct object of transitive verb.

Erg: ergative

Some languages (e.g. Basque) do not use nominative-accusative to distinguish subjects and objects. Instead, they use the contrast of absolutive-ergative.

The ergative case marks subject of transitive verb.

Dat: dative

In many languages this is the word form used for indirect objects of verbs.


Gen: genitive

Prototypical meaning of genitive is that the noun phrase somehow belongs to its governor; it would often be translated by the English preposition of. English has the “saxon genitive” formed by the suffix ‘s; but we will normally not need the feature in English because the suffix gets separated from the noun during tokenization.

Note that despite considerable semantic overlap, the genitive case is not the same as the feature of possessivity (Poss). Possessivity is a lexical feature, i.e. it applies to lemma and its whole paradigm. Genitive is a feature of just a subset of word forms of the lemma. Semantics of possessivity is much more clearly defined while the genitive (as many other cases) may be required in situations that have nothing to do with possessing. For example, [cs] bez prezidentovy dcery “without the president’s daughter” is a prepositional phrase containing the preposition bez “without”, the possessive adjective prezidentovy “president’s” and the noun dcery “daughter”. The possessive adjective is derived from the noun prezident but it is really an adjective (with separate lemma and paradigm), not just a form of the noun. In addition, both the adjective and the noun are in their genitive forms (the nominative would be prezidentova dcera). There is nothing possessive about this particular occurrence of the genitive. It is there because the preposition bez always requires its argument to be in genitive.


Note that in Basque, Gen should be used for possessive genitive (as opposed to locative genitive): diktadorearen erregimena “dictator’s regime”; diktadore “dictator”.

Voc: vocative

The vocative case is a special form of noun used to address someone. Thus it predominantly appears with animate nouns (see the feature of Animacy). Nevertheless this is not a grammatical restriction and inanimate things can be addressed as well.


Loc: locative

The locative case often expresses location in space or time, which gave it its name. As elsewhere, non-locational meanings also exist and they are not rare. Uralic languages have a complex set of fine-grained locational and directional cases (see below) instead of the locative. Even in languages that have locative, some location roles may be expressed using other cases (e.g. because those cases are required by a preposition).

In Slavic languages this is the only case that is used exclusively in combination with prepositions (but such a restriction may not hold in other languages that have locative).


Ins: instrumental / instructive

The role from which the name of the instrumental case is derived is that the noun is used as instrument to do something (as in [cs] psát perem “to write using a pen”). Many other meanings are possible, e.g. in Czech the instrumental is required by the preposition s “with” and thus it includes the meaning expressed in other languages by the comitative case.

In Czech the instrumental is also used for the agent-object in passive constructions (cf. the English preposition by).


A semantically similar case called instructive is used rarely in Finnish to express “with (the aid of)”. It can be applied to infinitives that behave much like nouns in Finnish. We propose one label for both instrumental and instructive (instrumental is not defined in Finnish).


Par: partitive

In Finnish the partitive case expresses indefinite identity and unfinished actions without result.


Examples comparing partitive with accusative: ammuin karhun “I shot a bear.Acc” (and I know that it is dead); ammuin karhua “I shot at a bear.Par” (but I may have missed).

Using accusative instead of partitive may also substitute the missing future tense: luen kirjan “I will read the book.Acc”; luen kirjaa “I am reading the book.Par”.

Dis: distributive

The distributive case conveys that something happened to every member of a set, one in a time. Or it may express frequency.


Ess: essive / prolative

The essive case expresses a temporary state, often it corresponds to English “as a …” A similar case in Basque is called prolative and it should be tagged Ess too.


Tra: translative / factive

The translative case expresses a change of state (“it becomes X”, “it changes to X”). Also used for the phrase “in language X”. In the Szeged Treebank, this case is called factive.


Com: comitative / associative

The comitative (also called associative) case corresponds to English “together with …”


Abe: abessive

The abessive case corresponds to the English preposition without.


Ine: inessive

The inessive case expresses location inside of something.


Ill: illative

The illative case expresses direction into something.


Ela: elative

The elative case expresses direction out of something.


Add: additive

Distinguished by some scholars in Estonian, not recognized by traditional grammar, exists in the Multext-East Estonian tagset and in the Eesti keele puudepank. Reportedly same or similar meaning as illative. Forms of this case exist only in singular and not for all nouns.

Ade: adessive

The adessive case expresses location at or on something. The corresponding directional cases are allative (towards something) and ablative (from something).


Note that adessive is used to express location on the surface of something in Finnish and Estonian, but does not carry this meaning in Hungarian.

All: allative

The allative case expresses direction to something (destination is adessive, i.e. at or on that something).


Abl: ablative

Prototypical meaning: direction from some point.


Sup: superessive

Used, chiefly in Hungarian, to indicate location on top of something or on the surface of something.


Sub: sublative

The sublative case is used in Finno-Ugric languages to express the destination of movement, originally to the surface of something (e.g. “to climb a tree”), and, by extension, in other figurative meanings as well (e.g. “to university”).


Del: delative

Used, chiefly in Hungarian, to express the movement from the surface of something (like “moved off the table”). Other meanings are possible as well, e.g. “about something”.


Lat: lative / directional allative

The lative case denotes movement towards/to/into/onto something. Similar case in Basque is called directional allative (Spanish adlativo direccional). However, lative is typically thought of as a union of allative, illative and sublative, while in Basque it is derived from allative, which also exists independently.


Tem: temporal

The temporal case is used to indicate time.


Ter: terminative / terminal allative

The terminative case specifies where something ends in space or time. Similar case in Basque is called terminal allative (Spanish adlativo terminal).


Cau: causative / motivative

Noun in this case is the cause of something. In Hungarian it also seems to be used frequently with currency (“to buy something for the money”) and it also can mean the goal of something.


Ben: benefactive / destinative

The benefactive case corresponds to the English preposition for.


edit Case

Definite: definiteness or state

Definiteness is typically a feature of nouns, adjectives and articles. Its value distinguishes whether we are talking about something known and concrete, or something general or unknown. It can be marked on definite and indefinite articles, or directly on nouns, adjectives etc. In Arabic, definiteness is also called the “state”.

Ind: indefinite


Def: definite


Red: reduced

Used in construct state in Arabic. If two nouns are in genitive relation, the first one (the “nomen regens”) has “reduced definiteness,” the second is the genitive and can be either definite or indefinite. Reduced form has neither the definite morpheme (article), nor the indefinite morpheme (nunation).


Com: complex

Used in improper annexation in Arabic. The genitive construction described above normally consists of two nouns (first reduced, second genitive). That is called proper annexation or iḍāfa. If the first member is an adjective or adjectivally used participle and the second member is a definite noun, the construction is called improper annexation or false iḍāfa. The result is a compound adjective that is usually used as an attributive adjunct and thus must agree in definiteness with the noun it modifies. Its first part (the adjective or participle) may get again the definite article. Although it may look the same as the form for the definite state, it is assigned a special value of complex state to reflect the different origin. See also Hajič et al. page 3.


edit Definite

Degree: degree of comparison

Degree of comparison is typically an inflectional feature of some adjectives and adverbs.

Pos: positive, first degree

This is the base form that merely states a quality of something, without comparing it to qualities of others. Note that although this degree is traditionally called “positive”, negative properties can be compared, too.


Cmp: comparative, second degree

The quality of one object is compared to the same quality of another object.


Sup: superlative, third degree

The quality of one object is compared to the same quality of all other objects within a set.


Abs: absolute superlative

Some languages can express morphologically that the studied quality of the given object is so strong that there is hardly any other object exceeding it. The quality is not actually compared to any particular set of objects.


edit Degree

Gender: gender

Gender is usually a lexical feature of nouns and inflectional feature of other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns. In English gender affects only the choice of the personal pronoun (he / she / it) and the feature is usually not encoded in English tagsets.

See also the related feature of Animacy.

African languages have an analogous feature of noun classes: there might be separate grammatical categories for flat objects, long thin objects etc. African noun classes are not covered in the current proposal because none of the tagsets on which the proposal is based are for a language with noun classes. They might be added in future.

Masc: masculine gender

Nouns denoting male persons are masculine. Other nouns may be also grammatically masculine, without any relation to sex.


Fem: feminine gender

Nouns denoting female persons are feminine. Other nouns may be also grammatically feminine, without any relation to sex.


Neut: neuter gender

Some languages have only the masculine/feminine distinction while others also have this third gender for nouns that are neither masculine nor feminine (grammatically).


Com: common gender

Some languages do not distinguish masculine/feminine most of the time but they do distinguish neuter vs. non-neuter (Swedish neutrum / utrum). The non-neuter is called common gender.

Note that it could also be expressed as a combined value Gender=Fem,Masc. Nevertheless we keep Com also as a separate value. Combined feature values should only be used in exceptional, undecided cases, not for something that occurs systematically in the grammar. Language-specific extensions to these guidelines should determine whether the Com value is appropriate for a particular language.

Note further that the Com value is not intended for cases where we just cannot derive the gender from the word itself (without seeing the context), while the language actually distinguishes Masc and Fem. For example, in Spanish, nouns distinguish two genders, masculine and feminine, and every noun can be classified as either Masc or Fem. Adjectives are supposed to agree with nouns in gender (and number), which they typically achieve by alternating -o / -a. But then there are adjectives such as grande or feliz that have only one form for both genders. So we cannot tell whether they are masculine or feminine unless we see the context. Yet they are either masculine or feminine (feminine in una ciudad grande, masculine in un puerto grande). Therefore in Spanish we should not tag grande with Gender=Com. Instead, we should either drop the gender feature entirely (suggesting that this word does not inflect for gender) or tag individual instances of grande as either masculine or feminine, depending on context.

edit Gender

Mood: mood

Mood is a feature that expresses modality and subclassifies finite verb forms.

Ind: indicative

The indicative can be considered the default mood. A verb in indicative merely states that something happens, has happened or will happen, without adding any attitude of the speaker.


Imp: imperative

The speaker uses imperative to order or ask the addressee to do the action of the verb.


Cnd: conditional

The conditional mood is used to express actions that would have taken place under some circumstances but they actually did not / do not happen. Grammars of some languages may classify conditional as tense (rather than mood) but e.g. in Czech it combines with two different tenses (past and present).


Pot: potential

The action of the verb is likely but not certain. Used e.g. in Finnish.

Sub: subjunctive / conjunctive

The subjunctive mood is used under certain circumstances in subordinate clauses, typically for actions that are subjective or otherwise uncertain. In German, it may be also used to convey the conditional meaning.


Jus: jussive

The jussive mood expresses the desire that the action happens. Used e.g. in Arabic.

Qot: quotative

The quotative mood is used e.g. in Estonian to denote direct speech.

Opt: optative

Used e.g. in Turkish in exclamations like “May you have a long life!” or “If only I were rich!”

Des: desiderative

The desiderative mood corresponds to the modal verb “want to”: “He wants to come.” Used e.g. in Turkish.

Nec: necessitative

The necessitative mood corresponds to the modal verbs “must, should, have to”: “He must come.” Used e.g. in Turkish.

edit Mood

Negative: whether the word can be or is negated

Negativeness is typically a feature of verbs, adjectives, sometimes also adverbs and nouns in languages that negate using bound morphemes. For instance, all Czech verbs and adjectives can be negated using the prefix ne-. In English, verbs are negated using the particle not and adjectives are also negated using prefixes, although the process is less productive than in Czech (wise – unwise, probable – improbable).

Note that Negative=Neg is not the same thing as PronType=Neg. For pronouns and other pronominal parts of speech there is no such binary opposition as for verbs and adjectives. (There is no such thing as “affirmative pronoun”.)

The negativeness feature could be also used to distinguish response interjections yes and no.

Pos: positive, affirmative


Neg: negative


edit Negative

NumType: numeral type

Some languages (especially Slavic) have a complex system of numerals. For example, in the school grammar of Czech, the main part of speech is “numeral”, it includes almost everything where counting is involved and there are various subtypes. It also includes interrogative, relative, indefinite and demonstrative words referring to numbers (words like kolik / how many, tolik / so many, několik / some, a few), so at the same time we may have a non-empty value of PronType. (In English, these words are called quantifiers and they are considered a subgroup of determiners.)

From the syntactic point of view, some numtypes behave like adjectives and some behave like adverbs. We tag them u-pos/ADJ and u-pos/ADV respectively. Thus the NumType feature applies to several different parts of speech:

Card: cardinal number or corresponding interrogative / relative / indefinite / demonstrative word

Note that in some Indo-European languages there is a fuzzy borderline between numerals and nouns for thousand, million and billion.


Ord: ordinal number or corresponding interrogative / relative / indefinite / demonstrative word

This is a subtype of adjective or (in some languages) of adverb.


Mult: multiplicative numeral or corresponding interrogative / relative / indefinite / demonstrative word

This is subtype of adverb.


Frac: fraction

This is a subtype of cardinal numbers, occasionally distinguished in corpora. It may denote a fraction or just the denominator of the fraction. In various languages these words may behave morphologically and syntactically as nouns or ordinal numerals.


Sets: number of sets of things

Morphologically distinct class of numerals used to count sets of things, or nouns that are pluralia tantum.


Dist: distributive numeral

Used to express that the same quantity is distributed to each member in a set of targets.


Range: range of values

This could be considered a subtype of cardinal numbers, occasionally distinguished in corpora.


Gen: generic numeral, i.e. a numeral that is neither of the above

Czech school grammar distinguishes this subclass, which is why it appears in Czech tagsets. Other Slavic languages may have similar words but their traditional classification may differ. (Note that “generic numerals” in Czech grammar also include the Sets subclass mentioned above.)


edit NumType

Number: number

Number is usually an inflectional feature of nouns and, depending on language, other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns.

Sing: singular number

A singular noun denotes one person, animal or thing.


Plur: plural number

A plural noun denotes several persons, animals or things.


Dual: dual number

A dual noun denotes two persons, animals or things.


Ptan: plurale tantum

Some nouns appear only in the plural form even though they denote one thing (semantic singular); some tagsets mark this distinction. Grammatically they behave like plurals, so Plur is obviously the back-off value here; however, if the language also marks gender, the non-existence of singular form sometimes means that the gender is unknown. In Czech, special type of numerals is used when counting nouns that are plurale tantum (NumType = Sets).


Coll: collective / mass / singulare tantum

Collective or mass or singulare tantum is a special case of singular. It applies to words that use grammatical singular to describe sets of objects, i.e. semantic plural. Although in theory they might be able to form plural, in practice it would be rarely semantically plausible. Sometimes, the plural form exists and means “several sorts of” or “several packages of”.


edit Number

Person: person

Person is typically feature of personal and possessive pronouns / determiners, and of verbs. On verbs it is in fact an agreement feature that marks the person of the verb’s subject (some languages, e.g. Basque, can also mark person of objects). Person marked on verbs makes it unnecessary to always add a personal pronoun as subject and thus subjects are sometimes dropped (pro-drop languages).

1: first person

In singular, the first person refers just to the speaker / author. In plural, it must include the speaker and one or more additional persons. Some languages (e.g. Taiwanese) distinguish inclusive and exclusive 1st person plural pronouns: the former include the addressee of the utterance (i.e. I + you), the latter exclude them (i.e. I + they).


2: second person

In singular, the second person refers to the addressee of the utterance / text. In plural, it may mean several addressees and optionally some third persons too.


3: third person

The third person refers to one or more persons that are neither speakers nor addressees.


edit Person

Poss: possessive

Boolean feature of pronouns, determiners or adjectives. It tells whether the word is possessive.

While many tagsets would have “possessive” as one of the various pronoun types, this feature is intentionally separate from PronType, as it is orthogonal to pronominal types. Several of the pronominal types can be optionally possessive, and adjectives can too.

Yes: it is possessive

Note that there is no No value. If the word is not possessive, the Poss feature will just not be mentioned in the FEAT column. (Which means that empty value has the No meaning.)


edit Poss

PronType: pronominal type

This feature typically applies to pronouns, determiners, pronominal numerals (quantifiers) and pronominal adverbs.

Prs: personal or possessive personal pronoun or determiner

See also the Poss feature that distinguishes normal personal pronouns from possessives. Note that Prs also includes reflexive personal/possessive pronouns (e.g. [cs] se / svůj; see the Reflex feature).


Rcp: reciprocal pronoun


Art: article

Article is a special case of determiner that bears the feature of definiteness (in other languages, the feature may be marked directly on nouns).


Int: interrogative pronoun, determiner, numeral or adverb

Note that possessive interrogative determiners (whose) can be distinguished by the Poss feature.


Rel: relative pronoun, determiner, numeral or adverb

Note that in many languages this class heavily overlaps with interrogatives, yet there are pronouns that are only relative, and in some languages (Bulgarian, Hindi) the two classes seem to be distinct.


Dem: demonstrative pronoun, determiner, numeral or adverb

These are often parallel to interrogatives. Some tagsets might also distinguish a separate feature of distance (here / there; [es] aquí / ahí / allí).


Tot: total (collective) pronoun, determiner or adverb


Neg: negative pronoun, determiner or adverb


Ind: indefinite pronoun, determiner, numeral or adverb

Note that some tagsets might further subclassify this category to distinguish “some” from “any” etc. Such distinctions are not part of universal features but may be added in language-specific extensions.


edit PronType

Reflex: reflexive

Boolean feature, typically of pronouns or determiners. It tells whether the word is reflexive, i.e. refers to the subject of its clause.

While many tagsets would have “reflexive” as one of the various pronoun types, this feature is intentionally separate from PronType, as it is orthogonal to pronominal types.

Note that while some languages also have reflexive verbs, these are in fact fused verbs with reflexive pronouns, as in Spanish despertarse or Russian проснуться (both meaning “to wake up”). Thus in these cases the fused token will be split to two syntactic words, one of them being a reflexive pronoun.

Yes: it is reflexive

Note that there is no No value. If the word is not reflexive, the Reflex feature will just not be mentioned in the FEAT column. (Which means that empty value has the No meaning.)


edit Reflex

Tense: tense

Tense is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.

Tense is a feature that specifies the time when the action took / takes / will take place, in relation to the current moment or to another action in the utterance. In some languages (e.g. English), some tenses are actually combinations of tense and aspect. In other languages (e.g. Czech), aspect and tense are separate, although not completely independent of each other.

Note that we are defining features that apply to a single word. If a tense is constructed periphrastically (two or more words, e.g. auxiliary verb indicative + participle of the main verb) and none of the participating words are specific to this tense, then the features will probably not directly reveal the tense. For instance, [en] I had been there is past perfect (pluperfect) tense, formed periphrastically by the simple past tense of the auxiliary to have and the past participle of the main verb to be. The auxiliary will be tagged VerbForm=Fin|Mood=Ind|Tense=Past and the participle will have VerbForm=Part|Tense=Past; none of the two will have Tense=Pqp. On the other hand, Portuguese can form the pluperfect morphologically as just one word, such as estivera, which will thus be tagged VerbForm=Fin|Mood=Ind|Tense=Pqp.

Past: past tense / preterite / aorist

The past tense denotes actions that happened before the current moment. In English, this is the simple past form. In German, this is the Präteritum. In Turkish, this is the non-narrative past. In Bulgarian, this is aorist, the aspect-neutral past tense that can be used freely with both imperfective and perfective verbs (see also imperfect).


Pres: present tense

The present tense denotes actions that are happening right now or that usually happen.


Fut: future tense

The future tense denotes actions that will happen after the current moment.


Imp: imperfect

Used in e.g. Bulgarian and Croatian, imperfect is a special case of the past tense. Note that, unfortunately, imperfect tense is not always the same as past tense + imperfective aspect. For instance, in Bulgarian, there is lexical aspect, inherent in verb meaning, and grammatical aspect, which does not necessarily always match the lexical one. In main clauses, imperfective verbs can have imperfect tense and perfective verbs have perfect tense. However, both rules can be violated in embedded clauses.

Nar: narrative

Special case of the past tense, this is the Turkish miş-past. The difference is whether the speaker personally witnessed the action he is describing, or not.

Pqp: pluperfect

The pluperfect denotes action that happened before another action in past. This value does not apply to English where the pluperfect (past perfect) is constructed analytically. It applies e.g. to Portuguese.

edit Tense

VerbForm: form of verb or deverbative

Even though the name of the feature seems to suggest that it is used exclusively with verbs, it is not the case. Some verb forms in some languages actually form a gray zone between verbs and other parts of speech (nouns, adjectives and adverbs). For instance, participles may be either classified as verbs or as adjectives, depending on language and context. In both cases VerbForm=Part may be used to separate them from other verb forms or other types of adjectives.

Fin: finite verb

Rule of thumb: if it has non-empty Mood, it is finite. But beware that some tagsets conflate verb forms and moods into one feature.


Inf: infinitive

Infinitive is the citation form of verbs in many languages. Unlike in English, it often has morphological form that is distinct from the finite forms. Infinitives may be used together with auxiliaries to form periphrastic tenses (e.g. future tense [cs] budu sedět v letadle “I will sit in a plane”), they appear as arguments of modal verbs etc. In some languages they behave similarly to nouns and are used as such (similar to the gerund in English).


Sup: supine

Supine is a rare verb form. It survives in some Slavic languages (Slovenian) and is used instead of infinitive as the argument of motion verbs (old [cs] jdu spat lit. I-go sleep).

A form called “supine” also exists in Swedish where it is a special form of the participle, used to form the composite past form of a verb. It is used after the auxiliary verb ha (to have) but not after vara (to be):

Part: participle

Participle is a non-finite verb form that shares properties of verbs and adjectives. Its usage varies across languages. It may be used to form various periphrastic verb forms such as complex tenses and passives; it may be also used purely adjectively.

Other features may help to distinguish past/present participles (English), active/passive participles (Czech), imperfect/perfect participles (Hindi) etc.


Trans: transgressive

The transgressive, also called adverbial participle, is a non-finite verb form that shares properties of verbs and adverbs. It appears e.g. in Slavic and Indo-Aryan languages.


Ger: gerund

Gerund is a non-finite verb form that shares properties of verbs and nouns. In English it shares the morphological form with present participle, which may mean that the tagset will not distinguish it from the participle.


edit VerbForm

Voice: voice

Voice is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.

For Indo-European speakers, voice means mainly the active-passive distinction. In other languages, other shades of verb meaning are categorized as voice.

Act: active voice

The subject of the verb is the doer of the action (agent), the object is affected by the action (patient).


Pass: passive voice

The subject of the verb is affected by the action (patient). The doer (agent) is either unexpressed or it appears as an object of the verb.


Rcp: reciprocal voice


Cau: causative voice

Documentation of the METU Sabanci treebank classifies causative as voice (page 26). Note that this is a feature of verbs. There are languages that have also the causative case of nouns.


edit Voice