For core part-of-speech categories, see the universal POS tags. The features listed here distinguish additional lexical and grammatical properties of words, not covered by the POS tags.
Boolean feature. Is this an abbreviation? Note that the abbreviated word(s) typically belongs to a part of speech other than u-pos/X.
Note: This feature is new in UD version 2. It was used as a language-specific addition in several treebanks in version 1.
Yes: it is abbreviation
Examples: [en] etc., J., UK
Similarly to Gender (and to the African noun classes), animacy is usually a lexical feature of nouns and inflectional feature of other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns. It is independent of gender, therefore it is encoded separately in some tagsets (e.g. all the Multext-East tagsets). On the other hand, in Czech the (almost) only grammatical implications occur within the masculine gender, which is why the PDT tagset does not have animateness as separate feature and instead defines four genders: masculine animate, masculine inanimate, feminine and neuter. We follow the two-feature approach used in Multext-East (many languages) because it is safer.
Polish is special in that it also distinguishes grammatically human vs. non-human animates. It can be demonstrated by inflection of the example word który “which” (boldface forms differ from the middle row):
More generally: Some languages distinguish animate vs. inanimate (e.g. Czech masculines), some languages distinguish human vs. non-human (e.g. Yuwan, a Ryukyuan language), and others distinguish three values, human vs. non-human animate vs. inanimate (e.g. Polish masculines).
Human beings, animals, fictional characters, names of professions etc. are all animate. Even nouns that are normally inanimate can be inflected as animate if they are personified. For instance, consider a children’s story about cars where cars live and talk as people; then the cars may become and be inflected as animates.
Nouns that are not animate are inanimate.
A subset of animates that only includes human beings (and personified characters) but not animals.
In languages that only distinguish human from non-human, this value includes
inanimates. In languages that distinguish human animates, non-human animates
and inanimates, this value is used only for non-human animates, while
is used for inanimates.
Aspect is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.
Aspect is a feature that specifies duration of the action in time, whether the action has been completed etc. In some languages (e.g. English), some tenses are actually combinations of tense and aspect. In other languages (e.g. Czech), aspect and tense are separate, although not completely independent of each other.
In Czech and other Slavic languages, aspect is a lexical feature. Pairs of imperfective and perfective verbs exist and are often morphologically related but the space is highly irregular and the verbs are considered to belong to separate lemmas.
Since we proceed bottom-up, the current standard covers only a few aspect values found in corpora. See Wikipedia (http://en.wikipedia.org/wiki/Grammatical_aspect) for a long list of other possible aspects.
Imp: imperfect aspect
The action took / takes / will take some time span and there is no information whether and when it was / will be completed.
- [cs] péci “to bake” (Imp); pekl chleba “he baked / was baking a bread”
Perf: perfect aspect
The action has been / will have been completed. Since there is emphasis on one point on the time scale (the point of completion), this aspect does not work well with the present tense. For example, Czech morphology can create present forms of perfective verbs but these actually have a future meaning.
- [cs] upéci “to bake” (Perf); upekl chleba “he baked / has baked a bread”
Prosp: prospective aspect
In general, prospective aspect can be described as relative future: the action is/was/will be expected to take place at a moment that follows the reference point; the reference point itself can be in past, present or future. In the English sentence When I got home yesterday, John called and said he would arrive soon, the last clause (he would arrive soon) is in prospective aspect. Nevertheless, English does not have overt affixal morphemes dedicated to the prospective aspect, and we do not need the label in English. But other languages do; the -ko suffix in Basque is an example.
Note that this value was called
Pro in UD v1 and it has been renamed
in UD v2.
- [eu] Liburua irakurriko behar du. lit. book-a read-Prosp must AUX “He must go to read a book.”
Prog: progressive aspect
English progressive tenses (I am eating, I have been doing …) have this aspect. They are constructed analytically (auxiliary + present participle) but the -ing participle is so bound to progressive meaning that it seems a good idea to annotate it with this feature (we have to distinguish it from the past participle somehow; we may use both the “Tense” and the “Aspect” features).
In languages other than English, the progressive meaning may be expressed by morphemes bound to the main verb, which makes this value even more justified. Example is Turkish.
Hab: habitual aspect
English simple present has this aspect.
Iter: iterative / frequentative aspect
Denotes repeated action. Attested e.g. in Hungarian.
Iteratives also exist in Czech with this name and meaning but they can be formed
only from imperfective verbs and they are usually not classified as a separate
aspect; they are just
Note: This value is new in UD v2 but a similar value has been used in UD v1
as language-specific for Hungarian, though it was called frequentative there
- [hu] üt “hit”, ütöget “hit several times”
Case is usually an inflectional feature of nouns and,
depending on language, other parts of speech (pronouns,
adjectives, determiners, numerals,
verbs) that mark agreement with nouns. In some tagsets
it is also valency feature of adpositions (saying that
the adposition requires its argument to be in that case). Annotating
preposition valency case in UD treebanks would be superfluous because the
same case feature can be found at the nominal to which the preposition
Case helps specify the role of the noun phrase in the sentence, especially in free-word-order languages. For example, the nominative and accusative cases often distinguish subject and object of the verb, while in fixed-word-order languages these functions would be distinguished merely by the positions of the nouns in the sentence.
Here on the level of morphosyntactic features we are dealing with case expressed morphologically, i.e. by bound morphemes (affixes). Note that on a higher level case can be understood more broadly as the role, and it can be also expressed by adding an adposition to the noun. What is expressed by affixes in one language can be expressed using adpositions in another language. Cf. the u-dep/case dependency label.
- [cs] nominative matka “mother”, genitive matky, dative matce, accusative matku, vocative matko, locative matce, instrumental matkou
- [de] nominative der Mann “the man”, genitive des Mannes, dative dem Mann, accusative den Mann
- [en] nominative/direct case he, she, accusative/oblique case him, her.
The descriptions of the individual case values below include semantic hints about the prototypical meaning of the case. Bear in mind that quite often a case will be used for a meaning that is totally unrelated to the meaning mentioned here. Valency of verbs, adpositions and other words will determine that the noun phrase must be in a particular grammatical case to fill a particular valency slot (semantic role). It is much the same as trying to explain the meaning of prepositions: most people would agree that the central meaning of English in is location in space or time but there are phrases where the meaning is less locational: In God we trust. Say it in English.
Note that Indian corpora based on the so-called Paninian model use a
related feature called vibhakti. It is a merger of the Case feature
described here and of various postpositions. Values of the feature are
language-dependent because they are copies of the relevant morphemes
(either bound morphemes or postpositions). Vibhakti can be mapped on
the Case values described here if we know 1. which source values are
bound morphemes (postpositions are separate nodes for us) and 2. what
is their meaning. For instance, the genitive case (
Gen) in Bengali
is marked using the suffix -ra (-র), i.e. vib=era. In Hindi, the
suffix has been split off the noun and it is now written as a separate
word – the postposition kā/kī/ke (का/की/के). Even if the
postpositional phrase can be understood as a genitive noun phrase, the
noun is not in genitive. Instead, the postposition requires that it
takes one of three case forms that are marked directly on the noun:
the oblique case (
Nom: nominative / direct
The base form of the noun, typically used as citation form (lemma). In many languages this is the word form used for subjects of clauses. If the language has only two cases, which are called “direct” and “oblique”, the direct case will be marked Nom.
Acc: accusative / oblique
Perhaps the second most widely spread morphological case. In many languages this is the word form used for direct objects of verbs. If the language has only two cases, which are called “direct” and “oblique”, the oblique case will be marked Acc.
Some languages (e.g. Basque) do not use nominative-accusative to distinguish subjects and objects. Instead, they use the contrast of absolutive-ergative.
The absolutive case marks subject of intransitive verb and direct object of transitive verb.
Some languages (e.g. Basque) do not use nominative-accusative to distinguish subjects and objects. Instead, they use the contrast of absolutive-ergative.
The ergative case marks subject of transitive verb.
In many languages this is the word form used for indirect objects of verbs.
- [de] Ich gebe meinem Bruder ein Geschenk. “I give my brother a present.” (meinem Bruder “my brother” is dative and ein Geschenk “a present” is accusative.)
Prototypical meaning of genitive is that the noun phrase somehow belongs to its governor; it would often be translated by the English preposition of. English has the “saxon genitive” formed by the suffix ‘s; but we will normally not need the feature in English because the suffix gets separated from the noun during tokenization.
Note that despite considerable semantic overlap, the genitive case is not the same as the feature of possessivity (Poss). Possessivity is a lexical feature, i.e. it applies to lemma and its whole paradigm. Genitive is a feature of just a subset of word forms of the lemma. Semantics of possessivity is much more clearly defined while the genitive (as many other cases) may be required in situations that have nothing to do with possessing. For example, [cs] bez prezidentovy dcery “without the president’s daughter” is a prepositional phrase containing the preposition bez “without”, the possessive adjective prezidentovy “president’s” and the noun dcery “daughter”. The possessive adjective is derived from the noun prezident but it is really an adjective (with separate lemma and paradigm), not just a form of the noun. In addition, both the adjective and the noun are in their genitive forms (the nominative would be prezidentova dcera). There is nothing possessive about this particular occurrence of the genitive. It is there because the preposition bez always requires its argument to be in genitive.
- [cs] Praha je hlavní město České republiky. “Prague is the capital of the Czech Republic.”
Note that in Basque, Gen should be used for possessive genitive (as opposed to locative genitive): diktadorearen erregimena “dictator’s regime”; diktadore “dictator”.
The vocative case is a special form of noun used to address someone. Thus it predominantly appears with animate nouns (see the feature of Animacy). Nevertheless this is not a grammatical restriction and inanimate things can be addressed as well.
- [cs] Co myslíš, Filipe? “What do you think, Filip?”
The locative case often expresses location in space or time, which gave it its name. As elsewhere, non-locational meanings also exist and they are not rare. Uralic languages have a complex set of fine-grained locational and directional cases (see below) instead of the locative. Even in languages that have locative, some location roles may be expressed using other cases (e.g. because those cases are required by a preposition).
In Slavic languages this is the only case that is used exclusively in combination with prepositions (but such a restriction may not hold in other languages that have locative).
- [cs] V červenci jsem byl ve Švédsku. “In July I was in Sweden.”
- [cs] Mluvili jsme tam o morfologii. “We talked there about morphology.” (Non-locational non-temporal example)
Ins: instrumental / instructive
The role from which the name of the instrumental case is derived is that the noun is used as instrument to do something (as in [cs] psát perem “to write using a pen”). Many other meanings are possible, e.g. in Czech the instrumental is required by the preposition s “with” and thus it includes the meaning expressed in other languages by the comitative case.
In Czech the instrumental is also used for the agent-object in passive constructions (cf. the English preposition by).
- [cs] Tento zákon byl schválen vládou. “This bill has been approved by the government.” (Passive example)
A semantically similar case called instructive is used rarely in Finnish to express “with (the aid of)”. It can be applied to infinitives that behave much like nouns in Finnish. We propose one label for both instrumental and instructive (instrumental is not defined in Finnish).
- [fi] lähteä “to leave”; 2003 lähtien “since 2003” (second infinitive in the instructive case)
- [fi] yllättää “to surprise”; sekaantui yllättäen valtataisteluun lit. was-involved-in by-surprise.Ins power-struggle.Ill.
In Finnish the partitive case expresses indefinite identity and unfinished actions without result.
- [fi] kolme taloa “three houses”; (the -a suffix of talo)
- [fi] rakastan tätä taloa “I love this house”
- [fi] saanko lainata kirjaa? “can I borrow the book?” (the -a suffix of kirja)
- [fi]lasissa on maitoa “there is (some) milk in the glass”
Examples comparing partitive with accusative: ammuin karhun “I shot a bear.Acc” (and I know that it is dead); ammuin karhua “I shot at a bear.Par” (but I may have missed).
Using accusative instead of partitive may also substitute the missing future tense: luen kirjan “I will read the book.Acc”; luen kirjaa “I am reading the book.Par”.
The distributive case conveys that something happened to every member of a set, one in a time. Or it may express frequency.
- [hu] fejenként “per capita”
- [hu] esetenként “in some cases”
- [hu] hetenként “once per week, weekly”
- [hu] tízpercenként “every ten minutes”
Ess: essive / prolative
The essive case expresses a temporary state, often it corresponds to
English “as a …” A similar case in Basque is called prolative
and it should be tagged
- [fi] lapsi “child”; lapsena “as a child / when he/she was child”
- [et] laps “child”; lapsena “as a child”
- [eu] erreformista “reformer”; erreformistatzat “as a reformer”
Tra: translative / factive
The translative case expresses a change of state (“it becomes X”, “it changes to X”). Also used for the phrase “in language X”. In the Szeged Treebank, this case is called factive.
- [fi] pitkä “long”; kasvoi pitkäksi “grew long”
- [fi] englanti “English language”; englanniksi “in/into English”
- [fi] kello kuusi “six o’clock”; kello kuudeksi “by six o’clock”
- [et] kell kuus “six o’clock”; kella kuueks “by six o’clock”
- [hu] Oroszlány halott várossá válhat. lit. Oroszlány dead city.Tra could-become. “Oroszlány could become a dead city.”
Com: comitative / associative
The comitative (also called associative) case corresponds to English “together with …”
- [et] koer “dog”; koeraga “with dog”
The abessive case corresponds to the English preposition without.
- [fi] raha “money”; rahatta “without money”
The inessive case expresses location inside of something.
- [hu] ház “house”; házban “in the house”
- [fi] talo “house”; talossa “in the house”
- [et] maja “house”; majas “in the house”
The illative case expresses direction into something.
- [hu] ház “house”; házba “into the house”
- [fi] talo “house”; taloon “into the house”
- [et] maja “house”; majasse “into the house”
The elative case expresses direction out of something.
- [hu] ház “house”; házból “from the house”
- [fi] talo “house”; talosta “from the house”
- [et] maja “house”; majast “from the house”
Distinguished by some scholars in Estonian, not recognized by traditional grammar, exists in the Multext-East Estonian tagset and in the Eesti keele puudepank. It has the meaning of illative, and some grammars will thus consider the additive just an alternative form of illative. Forms of this case exist only in singular and not for all nouns.
- [et] riik “government”; riigisse “to the government” (singular illative); riiki “to the government” (singular additive)
The adessive case expresses location at or on something. The corresponding directional cases are allative (towards something) and ablative (from something).
- [hu] pénztár “cash desk”; pénztárnál “at the cash desk”
- [fi] pöytä “table”; pöydällä “on the table”
- [et] laud “table”; laual “on the table”
Note that adessive is used to express location on the surface of something in Finnish and Estonian, but does not carry this meaning in Hungarian.
The allative case expresses direction to something (destination is adessive, i.e. at or on that something).
- [hu] pénztár “cash desk”; pénztárhoz “to the cash desk”
- [fi] pöytä “table”; pöydälle “onto the table”
Prototypical meaning: direction from some point.
- [hu] a barátomtól jövök “I’m coming from my friend”
- [fi] pöydältä “from the table”; katolta “from the roof”; rannalta “from the beach”
Used, chiefly in Hungarian, to indicate location on top of something or on the surface of something.
- [hu] asztal “table”; asztalon “on the table”
- [hu] könyvek “books”; könyveken “on books”
The sublative case is used in Finno-Ugric languages to express the destination of movement, originally to the surface of something (e.g. “to climb a tree”), and, by extension, in other figurative meanings as well (e.g. “to university”).
- [hu] Belgrádtól 150 kilométerre délnyugatra lit. Belgrade.Abl 150 kilometer.Sub southwest.Sub “150 kilometers southwest of Belgrade”
- [hu] hajó “ship”; hajóra “onto the ship”
- [hu] bokorra “on the shrub”
Used, chiefly in Hungarian, to express the movement from the surface of something (like “moved off the table”). Other meanings are possible as well, e.g. “about something”.
- [hu] asztal “table”; az asztalról “off the table”
- [hu]Budapestről jövök “I am coming from Budapest”
Lat: lative / directional allative
The lative case denotes movement towards/to/into/onto something. Similar case in Basque is called directional allative (Spanish adlativo direccional). However, lative is typically thought of as a union of allative, illative and sublative, while in Basque it is derived from allative, which also exists independently.
- [eu] behe “low”; beherantz “down”
The temporal case is used to indicate time.
- [hu] hétkor “at seven (o’clock)”; éjfélkor “at midnight”; karácsonykor “at Christmas”
Ter: terminative / terminal allative
The terminative case specifies where something ends in space or time. Similar case in Basque is called terminal allative (Spanish adlativo terminal).
- [et] jõeni “down to the river”; kella kuueni “till six o’clock”
- [hu] a házig “up to the house”; hat óráig “till six o’clock”
- [eu] erdi “half”; erdiraino “up to the half”
Cau: causative / motivative / purposive
Noun in this case is the cause of something. In Hungarian it also seems to be used frequently with currency (“to buy something for the money”) and it also can mean the goal of something.
- [hu] Egy világcég benzinkútjánál 7183 forintért tankoltam. lit. a world-wide.company petrol.station.Ade 7183 forint.Cau refueled “I refueled my car at the petrol station of a world-wide company for 7183 forints.”
- [hu] Elmentem a boltba tejért. lit. went the shop.Ill milk.Cau “I went to the shop to buy milk.”
- [eu] jokaera “behavior”; jokaeragatik “because of behavior”
Ben: benefactive / destinative
The benefactive case corresponds to the English preposition for.
- [eu] mutil “boy”; mutilarentzat “for boys”
The comparative case means “than X”. It marks the standard of comparison and it differs from the comparative Degree, which marks the property being compared. It occurs in Dravidian and Northeast-Caucasian languages.
The equative case means “X-like”, “similar to X”, “same as X”. It marks the standard of comparison and it differs from the equative Degree, which marks the property being compared. It occurs in Turkish.
- [tr] ben “I”; bence “like me”
Definite: definiteness or state
Definiteness is typically a feature of nouns, adjectives and articles. Its value distinguishes whether we are talking about something known and concrete, or something general or unknown. It can be marked on definite and indefinite articles, or directly on nouns, adjectives etc. In Arabic, definiteness is also called the “state”.
In languages where
Spec is distinguished the value
Ind is interpreted as non-specific
indefinite, i.e. “any (one) stick”.
- [en] a dog
Spec: specific indefinite
Specific indefinite, e.g. “a certain stick”.
Occurs e.g. in Lakota.
In languages where it is used the value
Ind is interpreted as non-specific
indefinite, i.e. “any (one) stick”.
- [en] the dog
Cons: construct state / reduced definiteness
Used in construct state in Arabic. If two nouns are in genitive relation, the first one (the “nomen regens”) has “reduced definiteness,” the second is the genitive and can be either definite or indefinite. Reduced form has neither the definite morpheme (article), nor the indefinite morpheme (nunation).
Note that in UD v1 this value was called
Red. It has been renamed
- [ar] indefinite state: حلوَةٌ ḥulwatun “a sweet”; definite state: الحلوَةُ al-ḥulwatu “the sweet”; construct state: حلوَةُ ḥulwatu “sweet of”.
Used in improper annexation in Arabic. The genitive construction described above normally consists of two nouns (first reduced, second genitive). That is called proper annexation or iḍāfa. If the first member is an adjective or adjectivally used participle and the second member is a definite noun, the construction is called improper annexation or false iḍāfa. The result is a compound adjective that is usually used as an attributive adjunct and thus must agree in definiteness with the noun it modifies. Its first part (the adjective or participle) may get again the definite article. Although it may look the same as the form for the definite state, it is assigned a special value of complex state to reflect the different origin. See also Hajič et al. page 3.
- [ar] مُخْتَلِفٌ muxtalifun “different/various” (active participle, Form VIII); نَوْعٌ ج أنْوَاعٌ nawˀun ja anwāˀun “kind”; مُخْتَلِفُ الأنْوَاعِ muxtalifu al-anwāˀi “of various kinds” (false iḍāfa); مَشَاكِلُ مُخْتَلِفَةُ الأنْوَاعِ mašākilu muxtalifatu al-anwāˀi “problems of various kinds”; اَلْمَشَاكِلُ الْمُخْتَلِفَةُ الأنْوَاعِ al-mašākilu al-muxtalifatu al-anwāˀi “the problems of various kinds”.
Degree: degree of comparison
Pos: positive, first degree
This is the base form that merely states a quality of something, without comparing it to qualities of others. Note that although this degree is traditionally called “positive”, negative properties can be compared, too.
- [en] young man
- [cs] mladý muž
The quality of one object is compared to the same quality of another object, and the result is that they are identical or similar (“as X as”). Note that it marks the adjective and it is distinct from the equative Case, which marks the standard of comparison.
- [et] pikkune (pikkus+ne) “as tall as”
Cmp: comparative, second degree
The quality of one object is compared to the same quality of another object.
- [en] the man is younger than me
- [cs] ten muž je mladší než já
Sup: superlative, third degree
The quality of one object is compared to the same quality of all other objects within a set.
- [en] this is the youngest man in our team
- [cs] toto je nejmladší muž v našem týmu
Abs: absolute superlative
Some languages can express morphologically that the studied quality of the given object is so strong that there is hardly any other object exceeding it. The quality is not actually compared to any particular set of objects.
- [es] guapo “handsome”; guapísimo “indescribably handsome”
Evidentiality is the morphological marking of a speaker’s source of information (Aikhenvald, 2004). It is sometimes viewed as a category of mood and modality.
Many different values are attested in the world’s languages. At present we only cover the firsthand vs. non-firsthand distinction, needed in Turkish. It distinguishes there the normal past tense (firsthand, also definite past tense, seen past tense) from the so-called miş-past (non-firsthand, renarrative, indefinite, heard past tense).
Aikhenvald also distinguishes reported evidentiality, occurring in Estonian and Latvian, among others. We currently use the quotative Mood for this.
Evident is a new universal feature in UD version 2. It was used as
a language-specific feature (under the name
Evidentiality) in UD v1 for Turkish.
- [tr] geldi (he/she/it came)
- [tr] gelmiş (he/she/it has come)
- Aikhenvald, Alexandra Y. 2004. Evidentiality. Oxford: Oxford University Press.
Foreign: is this a foreign word?
Boolean feature. Is this a foreign word? Not a loan word and not a foreign name but a genuinely foreign word appearing inside native text, e.g. inside direct speech, titles of books etc. This feature would apply either to the u-pos/X part of speech (unanalyzable token), or to other parts of speech if we know and are willing to annotate the class to which the word belongs in its original language.
Note: This feature is new in UD version 2. It was used as a language-specific addition in several treebanks in version 1 but it was not considered boolean and three values were foreseen. Since the additional values were used extremely rarely, they are not part of the universal definition of this feature in UD v2.
Yes: it is foreign
Example: [en] He said I could “dra åt helvete!“
Gender is usually a lexical feature of nouns and inflectional feature
of other parts of speech (pronouns,
adjectives, determiners, numerals,
verbs) that mark agreement with
nouns. In English gender affects only the choice of the personal
pronoun (he / she / it) and the feature is usually not encoded in
See also the related feature of Animacy.
African languages have an analogous feature of noun classes: there might be separate grammatical categories for flat objects, long thin objects etc. African noun classes are not covered in the current guidelines because none of the languages covered by UD so far has such classes. They might be added in future.
Masc: masculine gender
Nouns denoting male persons are masculine. Other nouns may be also grammatically masculine, without any relation to sex.
- [cs] hrad “castle”
Fem: feminine gender
Nouns denoting female persons are feminine. Other nouns may be also grammatically feminine, without any relation to sex.
- [de] Burg “castle”
Neut: neuter gender
Some languages have only the masculine/feminine distinction while others also have this third gender for nouns that are neither masculine nor feminine (grammatically).
- [en] castle
- [cs] dítě “child”
Com: common gender
Some languages do not distinguish masculine/feminine most of the time but they do distinguish neuter vs. non-neuter (Swedish neutrum / utrum). The non-neuter is called common gender.
Note that it could also be expressed as a combined value
Gender=Fem,Masc. Nevertheless we keep
Com also as a separate
value. Combined feature values should only be used in exceptional,
undecided cases, not for something that occurs systematically in the
grammar. Language-specific extensions to these guidelines should
determine whether the
Com value is appropriate for a particular
Note further that the
Com value is not intended for cases where
we just cannot derive the gender from the word itself (without seeing the context),
while the language actually distinguishes
For example, in Spanish, nouns distinguish two genders, masculine and feminine, and
every noun can be classified as either
Fem. Adjectives are supposed to
agree with nouns in gender (and number), which they typically achieve by alternating -o / -a.
But then there are adjectives such as grande or feliz that have only one form for both genders.
So we cannot tell whether they are masculine or feminine unless we see the context.
Yet they are either masculine or feminine (feminine in una ciudad grande, masculine in un puerto grande).
Therefore in Spanish we should not tag grande with
Instead, we should either drop the gender feature entirely
(suggesting that this word does not inflect for gender)
or tag individual instances of grande as either masculine or feminine, depending on context.
Mood is a feature that expresses modality and subclassifies finite verb forms.
The indicative can be considered the default mood. A verb in indicative merely states that something happens, has happened or will happen, without adding any attitude of the speaker.
- [cs] Studuješ na univerzitě. “You study at the university.”
- [de] Du studierst an der Universität. “You study at the university.”
- [tr] eve gidiyor “she is going home”
- [tr] eve gitti “she went home”
The speaker uses imperative to order or ask the addressee to do the action of the verb.
- [cs] Studuj na univerzitě! “Study at the university!”
- [de] Studiere an der Universität! “Study at the university!”
- [tr] eve git “go home!”
- [tr] eve gidin “go home!” (plural)
- [tr] eve gitsin “[let him] go home!” (3rd person imperative)
The conditional mood is used to express actions that would have taken place under some circumstances but they actually did not / do not happen. Grammars of some languages may classify conditional as tense (rather than mood) but e.g. in Czech it combines with two different tenses (past and present).
- [cs] Kdybych byl chytrý, studoval bych na univerzitě. “If I were smart I would study at the university” (note that only the auxiliary bych is specific to conditional; the active participle byl is also needed to analytically form the conditional mood, however, it will only be tagged as participle because it can also be used to form past tense indicative.)
- [tr] eve gittiyse “if she went home”
- [tr] eve gidiyorsa “if she is going home”
- [tr] eve giderse “if she goes home”
- [tr] eve gidecekdiyse “if she was going to go home”
The action of the verb is possible but not certain. This mood corresponds to the modal verbs can, might, be able to. Used e.g. in Finnish.
- [tr] eve gidebilir “she can go home”
- [tr] eve gidemeyebilir “she may not be able to go home”
Sub: subjunctive / conjunctive
The subjunctive mood is used under certain circumstances in subordinate clauses, typically for actions that are subjective or otherwise uncertain. In German, it may be also used to convey the conditional meaning.
- [fr] Je veux que tu le fasses “I want you to do it” lit. I want that you it do.Sub
The jussive mood expresses the desire that the action happens; it is thus close to both imperative and optative. Unlike in desiderative, it is the speaker, not the subject who wishes that it happens. Used e.g. in Arabic.
Means “in order to”, occurs in Amazonian languages.
The quotative mood is used e.g. in Estonian to denote direct speech.
Expresses exclamations like “May you have a long life!” or “If only I were rich!” In Turkish it also expresses suggestions.
- [tr] eve gidelim ‘let’s go home’
The desiderative mood corresponds to the modal verb “want to”: “He wants to come.” Used e.g. in Turkish.
The necessitative mood expresses necessity and corresponds to the modal verbs “must, should, have to”: “He must come.”
- [tr] eve gitmeli ‘she should go home’
- [tr] eve gitmeliydi ‘she should have gone home’
Expresses surprise, irony or doubt. Occurs in Albanian, other Balkan languages, and in Caddo (Native American from Oklahoma).
NumType: numeral type
Some languages (especially Slavic) have a complex system of numerals. For example, in the school grammar of Czech, the main part of speech is “numeral”, it includes almost everything where counting is involved and there are various subtypes. It also includes interrogative, relative, indefinite and demonstrative words referring to numbers (words like kolik / how many, tolik / so many, několik / some, a few), so at the same time we may have a non-empty value of PronType. (In English, these words are called quantifiers and they are considered a subgroup of determiners.)
From the syntactic point of view, some numtypes behave like adjectives
and some behave like adverbs. We tag them u-pos/ADJ and
u-pos/ADV respectively. Thus the
NumType feature applies to
several different parts of speech:
- u-pos/NUM: cardinal numerals
- u-pos/DET: quantifiers
- u-pos/ADJ: definite adjectival, e.g. ordinal numerals
- u-pos/ADV: adverbial (e.g. ordinal and multiplicative) numerals, both definite and pronominal
Card: cardinal number or corresponding interrogative / relative / indefinite / demonstrative word
Note that in some Indo-European languages there is a fuzzy borderline between numerals and nouns for thousand, million and billion.
- [en] one, two, three
- [cs] jeden, dva, tři “one, two, three”; kolik “how many”; několik “some”; tolik “so many”; mnoho “many”; málo “few”
- [cs] čtvero, patero, desatero (specific forms of four, five, ten;
they are morphologically, syntactically and stylistically distinct from the
default forms čtyři, pět, deset; in Czech grammar they are classified
as “generic numerals”, which also encompasses some other rare types;
Cardis the closest match for them among the universal types.
Ord: ordinal number or corresponding interrogative / relative / indefinite / demonstrative word
This is a subtype of adjective or (in some languages) of adverb.
- [en] first, second, third;
- [cs] adjectival: první “first”; druhý “second”, třetí “third”; kolikátý lit. how manieth “which rank”; několikátý “some rank”; tolikátý “this/that rank”
- [cs] adverbial: poprvé “for the first time”; podruhé “for the second time”; potřetí “for the third time”; pokolikáté “for which time”, poněkolikáté “for x-th time”, potolikáté
Mult: multiplicative numeral or corresponding interrogative / relative / indefinite / demonstrative word
This is subtype of adjective or adverb.
- [sl] dvojen “double, twofold”; trojen “triple, threefold”; četveren “fourfold”
- [cs] dvojí “twofold”; trojí “threefold” (multiplicative adjectives)
- [cs] jednou “once”; dvakrát “twice”; třikrát “three times”; kolikrát “how many times”, několikrát “several times”; tolikrát “so many times” (multiplicative adverbs)
This is a subtype of cardinal numbers, occasionally distinguished in corpora. It may denote a fraction or just the denominator of the fraction. In various languages these words may behave morphologically and syntactically as nouns or ordinal numerals.
- [en] three-quarters
- [cs] půl / polovina “half”; třetina “one third”; čtvrt / čtvrtina “quarter”
Sets: number of sets of things; collective numeral
Morphologically distinct class of numerals used to count sets of things, or nouns that are pluralia tantum. Some authors call this type collective numeral.
- [cs] dvoje / troje boty “two / three [pairs of] shoes”; as opposed to normal cardinal numbers: dvě / tři boty “two / three shoes”
Dist: distributive numeral
Used to express that the same quantity is distributed to each member in a set of targets.
- [hu] három-három in gyermekenként három-három ezer forinttal “three thousand forint per child”
Range: range of values
This could be considered a subtype of cardinal numbers, occasionally distinguished in corpora.
- [en] two-five “two to five” (provided tokenization leaves it as one token.)
Sing: singular number
A singular noun denotes one person, animal or thing.
- [en] car
Plur: plural number
A plural noun denotes several persons, animals or things.
- [en] cars
Dual: dual number
A dual noun denotes two persons, animals or things.
- [sl] singular glas “voice”, dual glasova “voices”, plural glasovi “voices”
- [ar] singular سَنَةٌ sanatun “year”, dual سَنَتَانِ sanatāni “years”, plural سِنُونَ sinūna “years”.
Tri: trial number
A trial pronoun denotes three persons, animals or things. It occurs in pronouns of several Austronesian languages.
Pauc: paucal number
A paucal noun denotes “a few” persons, animals or things.
Grpa: greater paucal number
A greater paucal noun denotes “more than several but not many” persons, animals or things. It occurs in Sursurunga, an Austronesian language.
Grpl: greater plural number
A greater plural noun denotes “many, all possible” persons, animals or things. Precise semantics varies across languages.
Inv: inverse number
Inverse number means non-default for that particular noun. (Some nouns are by default assumed to be singular, some plural.) Occurs e.g. in Kiowa.
Count count plural
Attested in Bulgarian and Macedonian. It is known variously as “counting form”,
“count plural” or “quantitative plural” (Sussex and Cubberley 2006, p. 324).
It is a special plural form of nouns if they occur after numerals.
(The form originates in the Proto-Slavic dual but it should not be marked
Number=Dual because 1. the dual vanished from Bulgarian and 2. the form is
no longer semantically tied to the number two.)
- [bg] tri stola “three chairs” vs. stolove “chairs”
Ptan: plurale tantum
Some nouns appear only in the plural form even though they denote one
thing (semantic singular); some tagsets mark this distinction.
Grammatically they behave like plurals, so
Plur is obviously the
back-off value here; however, if the language also marks gender, the
non-existence of singular form sometimes means that the gender is
unknown. In Czech, special type of numerals is used when counting
nouns that are plurale tantum (NumType = Sets).
- [en] scissors, pants
- [cs] nůžky, kalhoty
Coll: collective / mass / singulare tantum
Collective or mass or singulare tantum is a special case of singular. It applies to words that use grammatical singular to describe sets of objects, i.e. semantic plural. Although in theory they might be able to form plural, in practice it would be rarely semantically plausible. Sometimes, the plural form exists and means “several sorts of” or “several packages of”.
- [cs] lidstvo “mankind”
- Sussex, Roland and Cubberley, Paul. 2006. The Slavic Languages. Cambridge University Press.
Person is typically feature of personal and possessive pronouns / determiners, and of verbs. On verbs it is in fact an agreement feature that marks the person of the verb’s subject (some languages, e.g. Basque, can also mark person of objects). Person marked on verbs makes it unnecessary to always add a personal pronoun as subject and thus subjects are sometimes dropped (pro-drop languages).
0: zero person
Zero person is for impersonal statements, appears in Finnish as well as in Santa Ana Pueblo Keres. (The construction is distinctive in Finnish but it does not use unique morphology that would necessarily require a feature. However, it is morphologically distinct in Keres (Davis 1964:75).
1: first person
In singular, the first person refers just to the speaker / author. In plural, it must include the speaker and one or more additional persons. Some languages (e.g. Taiwanese) distinguish inclusive and exclusive 1st person plural pronouns: the former include the addressee of the utterance (i.e. I + you), the latter exclude them (i.e. I + they).
- [en] I, we
- [cs] dělám “I do”
2: second person
In singular, the second person refers to the addressee of the utterance / text. In plural, it may mean several addressees and optionally some third persons too.
- [en] you
- [cs] děláš “you do”
3: third person
The third person refers to one or more persons that are neither speakers nor addressees.
- [en] he, she, it, they
- [cs] dělá “he/she/it does”
4: fourth person
The fourth person can be understood as a third person argument morphologically distinguished from another third person argument, e.g. in Navajo.
- Davis, Irvine. 1964. The language of Santa Ana Pueblo (anthropological papers, no. 69). Smithsonian Institution Bureau of American Ethnology, Bulletin 191: Anthropological Papers, Numbers 68-74, Washington, DC: United States Government Printing Office, 53–190.
Polarity is typically a feature of verbs,
adjectives, sometimes also adverbs and
nouns in languages that negate using bound
In languages that negate using a function word,
Polarity is used to mark
that function word, unless it is a pro-form already marked with
PronType=Neg (see below).
Positive polarity (affirmativeness) is rarely, if at all, encoded using overt
morphology. The feature value
Polarity=Pos is usually used to signal that a lemma
has negative forms but this particular form is not negative. Using the feature
in such cases is somewhat optional for words that can be negated but rarely are.
For instance, all Czech verbs and adjectives can be negated using the prefix
ne-. In theory, all nouns can be negated too, with the meaning “anything
except the entities denotable by the original noun”. However, negated nouns
are rare and it is not necessary to annotate every positive noun with
Polarity=Pos. Language-specific documentation should define under which
circumstances the positive polarity is annotated.
In English, verbs are negated using the particle not and adjectives are also negated using prefixes, although the process is less productive than in Czech (wise – unwise, probable – improbable).
Polarity=Neg is not the same thing as
=Neg. For pronouns and other pronominal parts of speech
there is no such binary opposition as for verbs and adjectives. (There
is no such thing as “affirmative pronoun”.)
Polarity feature can be also used to distinguish response
interjections yes and no.
Polarity was called
Negative in the version 1 of UD guidelines and it is renamed in version 2.
Pos: positive, affirmative
- [cs] přišel “he came”
- [cs] velký “big”
- [en] yes
- [cs] nepřišel “he did not come”
- [cs] nevelký “not big”
- [en] not
- [en] no as in no, I don’t think so; but not as in we have no bananas
Various languages have various means to express politeness or respect; some
of the means are morphological. Three to four dimensions of politeness are
distinguished in linguistic literature. The
Polite feature currently covers
(and mixes) two of them; a more elaborate system of feature values may be
devised in future versions of UD if needed. The two axes covered are:
- speaker-referent axis (meant to include the addressee when he happens to be the referent)
- speaker-addressee axis (word forms depend on who is the addressee, although the addressee is not referred to)
Changing pronouns and/or person and/or number of the verb forms when respectable persons are addressed in Indo-European languages belongs to the speaker-referent axis because the honorific pronouns are used to refer to the addressee.
In Czech, formal second person has the same form for singular and plural, and is identical to informal second person plural. This involves both the pronoun and the finite verb but not a participle, which has no special formal form (that is, formal singular is identical to informal singular, not to informal plural).
In German, Spanish or Hindi, both number and person are changed (informal third person is used as formal second person) and in addition, special pronouns are used that only occur in the formal register ([de] Sie; [es] usted, ustedes; [hi] आप āpa).
In Japanese, verbs and other words have polite and informal forms but the polite
forms are not referring to the addressee (they are not in second person). They
are just used because of who the addressee is, even if the topic does not
involve the addressee at all. This kind of polite language is called teineigo (丁寧語)
and belongs to the speaker-addressee axis. Nevertheless, we currently use the
same values for both axes, i.e.
Polite=Form can be used for teineigo too.
This approach may be refined in future.
Infm: informal register
Usage varies but if the language distinguishes levels of politeness, then the informal register is usually meant for communication with family members and close friends.
- [cs] ty jdeš / vy jdete (you go.Sing/Plur)
- [de] du gehst / ihr geht (you go.Sing/Plur)
- [es] tú vas / vosotros vais (you go.Sing/Plur)
- [ja] 行かない ikanai (will not go)
Form: formal register
Usage varies but if the language distinguishes levels of politeness, then the polite register is usually meant for communication with strangers and people of higher social status than the one of the speaker.
- [cs] vy jdete (you go.Sing/Plur)
- [de] Sie gehen (you go.Sing/Plur)
- [es] usted va / ustedes van (you go.Sing/Plur)
- [ja] 行きません ikimasen (will not go)
Elev: referent elevating
This register belongs to the speaker-referent axis and can be seen as a subtype of the formal register there. As an example, Japanese sonkeigo (尊敬語) is a set of honorific forms that elevate the status of the referent.
- [ja] なさる nasaru, なさいます nasaimasu (to do; when talking about a customer or a superior)
Humb: speaker humbling
This register belongs to the speaker-referent axis and can be seen as a subtype of the formal register there. As an example, Japanese kenjōgo (謙譲語) is a set of honorific forms that lower the speaker’s status, thereby raising the referent’s status by comparison.
- [ja] いたす itasu, いたします itashimasu (to do; when referring to one’s own actions or the actions of a group member)
- Brown, Penelope and Stephen C. Levinson. 1987. Politeness: Some Universals in Language Usage. Studies in Interactional Sociolinguistics, Cambridge, UK: Cambridge University Press.
- Comrie, Bernard. 1976. Linguistic politeness axes: Speaker-addressee, speaker-referent, speaker-bystander. Pragmatics Microfiche 1.7(A3). Department of Linguistics, University of Cambridge.
- Wenger, James R. 1982. Some Universals of Honorific Language with Special Reference to Japanese. Ph.D. thesis, University of Arizona, Tucson, AZ.
Boolean feature of pronouns, determiners or adjectives. It tells whether the word is possessive.
While many tagsets would have “possessive” as one of the various pronoun types, this feature is intentionally separate from PronType, as it is orthogonal to pronominal types. Several of the pronominal types can be optionally possessive, and adjectives can too.
Yes: it is possessive
Note that there is no
No value. If the word is not possessive, the
Poss feature will just not be mentioned in the
FEAT column. (Which
means that empty value has the
- [en] my, your, his, mine, yours, whose
- [cs] possessive determiners: můj, tvůj, jeho, její, náš, váš, svůj, čí, jejichž
- [cs] possessive adjectives: otcův “father’s”, matčin “mother’s”
PronType: pronominal type
Prs: personal or possessive personal pronoun or determiner
See also the Poss feature that distinguishes normal personal
pronouns from possessives. Note that
Prs also includes reflexive
personal/possessive pronouns (e.g. [cs] se / svůj; see the
- [en] I, you, he, she, it, we, they, my, your, his, her, its, our, their, mine, yours, hers, ours, theirs
- [cs] já, ty, on, ona, ono, my, vy, oni, ony, se, můj, tvůj, jeho, její, náš, váš, jejich, svůj
Rcp: reciprocal pronoun
- [de] einander “each other”
- [da] hinanden “each other”
Article is a special case of determiner that bears the feature of definiteness (in other languages, the feature may be marked directly on nouns).
- [en] a, an, the
- [de] ein, eine, der, die, das
- [es] un, una, el, la
Int: interrogative pronoun, determiner, numeral or adverb
Note that possessive interrogative determiners (whose) can be distinguished by the Poss feature.
- [cs/en] kdo / who, co / what, který / which, čí / whose, kolik / how many, how much, kolikátý / how-maniest (ordinal quantifier), kolikrát / how many times, kde / where, kam / where to, kdy / when, jak / how, proč / why
Rel: relative pronoun, determiner, numeral or adverb
Note that in many languages this class heavily overlaps with interrogatives, yet there are pronouns that are only relative, and in some languages (Bulgarian, Hindi) the two classes are distinct.
- [cs] jenž, což “which”, “that” (relative but not interrogative pronouns); jehož “whose” (possessive relative pronoun)
Exc: exclamative determiner
Exclamative pro-adjectives (determiners) express the speaker’s surprise towards the modified noun, e.g. what in “What a surprise!” In many languages, exclamative determiners are recruited from the set of interrogative determiners. Therefore, not all tagsets distinguish them.
- [it] che
- [cs] jaký as in “Jaké překvapení!”
- [en] what as in “What a surprise!”
Dem: demonstrative pronoun, determiner, numeral or adverb
These are often parallel to interrogatives. Some tagsets might also distinguish a separate feature of distance (here / there; [es] aquí / ahí / allí).
- [cs/en] tento / this, tamten / that, takový / such, týž / same, tolik / so much, tolikátý / so-maniest (ordinal number), tolikrát / so many times, tady / here, tam / there, teď / now, tehdy / then, tak / so
Emp: emphatic determiner
Emphatic pro-adjectives (determiners) emphasize the nominal they depend on. There are similarities with reflexive and demonstrative pronouns / determiners.
- [ro] însuși
- [cs] sám
- [en] himself as in “He himself did it.”
Tot: total (collective) pronoun, determiner or adverb
- [cs/en] každý / every, everybody, everyone, each, všechno / everything, all, všude / everywhere, vždy / always
Neg: negative pronoun, determiner or adverb
Negative pronominal words are distinguished from negating particles
and from words that inflect for polarity (verbs, adjectives etc.) Those words
do not use
PronType=Neg, they use
Polarity=Neg instead. See the
Polarity feature for further details.
- [cs/en] nikdo / nobody, nic / nothing, nijaký / no, ničí / no one’s (possessive negative determiner), žádný / no, none, nikde / nowhere, nikdy / never, nijak / no way (lit. “no-how”)
Ind: indefinite pronoun, determiner, numeral or adverb
Note that some tagsets might further subclassify this category to distinguish “some” from “any” etc. Such distinctions are not part of universal features but may be added in language-specific extensions.
- [cs/en] někdo / somebody, něco / something, některý / some, něčí / someone’s (possessive indefinite pronoun), několik / a few, several (indefinite numeral/quantifier), několikátý / “a fewth”, “severalth” (indefinite ordinal numeral), několikrát / a few times, several times, někde / somewhere, někdy / sometimes, nějak / somehow
- [cs/en] kdokoli / anybody, cokoli / anything, kterýkoli / any, číkoli / anyone’s (possessive indefinite pronoun), kdekoli / anywhere, kdykoli / any time, jakkoli / anyhow
- [cs/en] málokdo / few people, leckdo / quite a few people, kdosi / somebody…
While many tagsets would have “reflexive” as one of the various pronoun types, this feature is intentionally separate from PronType, as it is orthogonal to pronominal types.
Note that while some languages also have reflexive verbs, these are in fact fused verbs with reflexive pronouns, as in Spanish despertarse or Russian проснуться (both meaning “to wake up”). Thus in these cases the fused token will be split to two syntactic words, one of them being a reflexive pronoun.
Yes: it is reflexive
Note that there is no
No value. If the word is not reflexive, the
Reflex feature will just not be mentioned in the
column. (Which means that empty value has the
- [cs] reflexive personal pronouns: se, si; reflexive possessive pronoun: svůj
Tense is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.
Tense is a feature that specifies the time when the action took / takes / will take place, in relation to the current moment or to another action in the utterance. In some languages (e.g. English), some tenses are actually combinations of tense and aspect. In other languages (e.g. Czech), aspect and tense are separate, although not completely independent of each other.
Note that we are defining features that apply to a single word. If a
tense is constructed periphrastically (two or more words,
e.g. auxiliary verb indicative + participle of the main verb) and none
of the participating words are specific to this tense, then the
features will probably not directly reveal the tense. For instance,
[en] I had been there is past perfect (pluperfect) tense,
formed periphrastically by the simple past tense of the auxiliary to
have and the past participle of the main verb to be. The auxiliary
will be tagged
VerbForm=Fin|Mood=Ind|Tense=Past and the participle
VerbForm=Part|Tense=Past; none of the two will have
Tense=Pqp. On the other hand, Portuguese can form the pluperfect
morphologically as just one word, such as estivera, which will thus be tagged
Past: past tense / preterite / aorist
The past tense denotes actions that happened before the current moment. In English, this is the simple past form. In German, this is the Präteritum. In Turkish, this is the non-narrative past. In Bulgarian, this is aorist, the aspect-neutral past tense that can be used freely with both imperfective and perfective verbs (see also imperfect).
- [en] he went home
Pres: present tense
The present tense denotes actions that are happening right now or that usually happen.
- [en] he goes home
Fut: future tense
The future tense denotes actions that will happen after the current moment.
- [es] irá a la casa “he/she/it will go home”
Used in e.g. Bulgarian and Croatian, imperfect is a special case of the past tense. Note that, unfortunately, imperfect tense is not always the same as past tense + imperfective aspect. For instance, in Bulgarian, there is lexical aspect, inherent in verb meaning, and grammatical aspect, which does not necessarily always match the lexical one. In main clauses, imperfective verbs can have imperfect tense and perfective verbs have perfect tense. However, both rules can be violated in embedded clauses.
The pluperfect denotes action that happened before another action in past. This value does not apply to English where the pluperfect (past perfect) is constructed analytically. It applies e.g. to Portuguese.
VerbForm: form of verb or deverbative
Even though the name of the feature seems to suggest that it is used
exclusively with verbs, it is not the case. Some verb
forms in some languages actually form a gray zone between verbs and
other parts of speech (nouns, adjectives
and adverbs). For instance, participles may be either
classified as verbs or as adjectives, depending on language and
context. In both cases
VerbForm=Part may be used to separate them
from other verb forms or other types of adjectives.
Fin: finite verb
Rule of thumb: if it has non-empty Mood, it is finite. But beware that some tagsets conflate verb forms and moods into one feature.
- [en] I do, he does
Infinitive is the citation form of verbs in many languages. Unlike in English, it often has morphological form that is distinct from the finite forms. Infinitives may be used together with auxiliaries to form periphrastic tenses (e.g. future tense [cs] budu sedět v letadle “I will sit in a plane”), they appear as arguments of modal verbs etc. In some languages they behave similarly to nouns and are used as such (similar to the gerund in English).
- [de] ich muss gehen “I must go”
Supine is a rare verb form. It survives in some Slavic languages (Slovenian) and is used instead of infinitive as the argument of motion verbs (old [cs] jdu spat lit. I-go sleep).
A form called “supine” also exists in Swedish where it is a special form of the participle, used to form the composite past form of a verb. It is used after the auxiliary verb ha (to have) but not after vara (to be):
- Simple past: I ate (the) dinner = Jag åt maten (using preterite)
- Composite past: I have eaten (the) dinner = Jag har ätit maten (using supine)
- Past participle common: (The) dinner is eaten = Maten är äten (using past participle)
- Past participle neuter: (The) apple is eaten = Äpplet är ätet
- Past participle plural: (The) apples are eaten = Äpplena är ätna
Part: participle, verbal adjective
Participle is a non-finite verb form that shares properties of verbs and adjectives. Its usage varies across languages. It may be used to form various periphrastic verb forms such as complex tenses and passives; it may be also used purely adjectively.
Other features may help to distinguish past/present participles (English), active/passive participles (Czech), imperfect/perfect participles (Hindi) etc.
- [en] he could have been prepared if he had forseen it; I will be driving home.
Conv: converb, transgressive, adverbial participle, verbal adverb
The converb, also called adverbial participle or transgressive, is a non-finite verb form that shares properties of verbs and adverbs. It appears e.g. in Slavic and Indo-Aryan languages.
Note that this value was called
Trans in UD v1 and it has been renamed
in UD v2.
- [cs] zírali na mne, pevně svírajíce své zbraně “they stared at me while gripping their guns firmly”; udělavši večeři, zavolala rodinu ke stolu “having prepared the dinner, she called her family to the table”
Used in Latin and Ancient Greek. Not to confuse with gerund.
Gerund is a non-finite verb form that shares properties of verbs and nouns. In English it shares the morphological form with present participle, which may mean that the tagset will not distinguish it from the participle.
VerbForm=Ger is discouraged and alternatives should be considered first
because the term gerund is rather confusing: in Spanish (and other Romance
languages) it denotes the present participle and should be thus labeled
Tense=Pres|VerbForm=Part; some Slavists use it to denote converbs (adverbial
participles), which should be labeled
VerbForm=Conv; and UD version 1
recommended (inspired by English) to use it for verbal nouns, which in UD v2
However, the feature is still available in UDv2 and can be used if the alternatives do not seem acceptable. The feature may be removed in future versions but comprehensive investigation has to be done first.
- [en] I look forward to seeing you; he turns a blind eye to my being late
Vnoun: verbal noun, masdar
Verbal nouns other than infinitives. Also called masdars by some authors, e.g. Haspelmath, 1995.
- [cs] dělání “doing”
- Haspelmath, Martin. 1995. The converb as a cross-linguistically valid category. Converbs in Cross-Linguistic Perspective: Structure and Meaning of Adverbial Verb Forms – Adverbial Participles, Gerunds –, edited by Martin Haspelmath and Ekkehard König, Berlin: Mouton de Gruyter, Empirical Approaches to Language Typology, 1–56.
Voice is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.
For Indo-European speakers, voice means mainly the active-passive distinction. In other languages, other shades of verb meaning are categorized as voice.
Act: active voice
The subject of the verb is the doer of the action (agent), the object is affected by the action (patient).
- [cs] Napadli jsme nepřítele. “We attacked the enemy” (the active participle napadli can be used to form either past tense or conditional mood; here it forms the past tense.)
Mid: middle voice
Between active and passive, needed e.g. in Ancient Greek or Sanskrit.
Pass: passive voice
The subject of the verb is affected by the action (patient). The doer (agent) is either unexpressed or it appears as an object of the verb.
- [cs] Jsme napadeni nepřítelem. “We are attacked by the enemy” (the passive participle napadeni is used to form passive in all tenses; here it forms the present passive.)
Antip: antipassive voice
In ergative-absolutive languages, an ergative subject is demoted to an absolutive subject.
Dir: direct voice
Used in direct-inverse voice systems, e.g. in North American languages. Direct means that the argument that is higher in salience hierarchy is the subject. Example hierarchy: human 1st person – 2nd – 3rd – non-human animate – inanimate.
Inv: inverse voice
Used in direct-inverse voice systems, e.g. in North American languages. Inverse voice marking means that the argument lower in the hierarchy functions as subject.
Rcp: reciprocal voice
- [tr] karıştı, tutuştular
Cau: causative voice
Documentation of the METU Sabanci treebank classifies causative as voice (page 26). Note that this is a feature of verbs. There are languages that have also the causative case of nouns.
- [tr] karıştırıyor “is confusing”