home issue tracker

This page pertains to UD version 2.

Universal features

For core part-of-speech categories, see the universal POS tags. The features listed here distinguish additional lexical and grammatical properties of words, not covered by the POS tags.

Lexical features* Inflectional features*
Nominal* Verbal*
PronType Gender VerbForm
NumType Animacy Mood
Poss NounClass Tense
Reflex Number Aspect
Other Case Voice
Abbr Definite Evident
Typo Deixis Polarity
Foreign DeixisRef Person
ExtPos Degree Polite
Clusivity
  Index: A abbreviation, abessive, ablative, above, absolute superlative, absolutive, accusative, active, actor-focus voice, additive, adelative, adessive, adlative, admirative, adverbial participle, affirmative, allative, animate, antipassive, aorist, article, aspect, associative, augmentative, B bantu noun class, below, benefactive, beneficiary-focus voice, C cardinal, caritive, case, causative case, causative voice, clusivity, collective noun, collective numeral, collective pronominal, comitative, common gender, comparative case, comparative degree, complex definiteness, conditional, conjunctive, considerative, construct state, converb, count plural, counting form, D dative, definite, definiteness, degree of comparison, deixis, deixis reference person, delative, demonstrative, desiderative, destinative, diminutive, direct case, direct voice, directional allative, distal, distributive case, distributive numeral, dual, E elative, elevated referent, emphatic, equative case, equative degree, ergative, essive, even, evidentiality, exclamative, exclusive, external part of speech, F factive, feminine, finite verb, first person, firsthand, foreign word, formal, fourth person, fraction, frequentative, future, G gender, genitive, gerund, gerundive, greater paucal, greater plural, H habitual, human, humbled speaker, I illative, imperative, imperfect tense, imperfective aspect, inanimate, inclusive, indefinite, indefinite pronominal, indicative, inelative, inessive, infinitive, informal, injunctive, inlative, instructive, instrumental, interrogative mood, interrogative pronominal, inverse number, inverse voice, irrealis, iterative, J jussive, L lative, location-focus voice, locative, M masculine, masdar, mass noun, medial, middle voice, modality, mood, motivative, multiplicative numeral, N narrative, necessitative, negative polarity, negative pronominal, neuter, nominative, non-finite verb, non-firsthand, non-human, non-past, non-specific indefinite, not visible, noun class, number, numeral type, O oblique case, optative, ordinal, P participle, partitive, passive, past, past perfect, patient-focus voice, paucal, perfective aspect, perlative, person, personal, pluperfect, plural, plurale tantum, polarity, politeness, positive degree, positive polarity, possessive, potential, present, preterite, privative, progressive, prolative, pronominal type, prospective, proximate, purposive case, purposive mood, Q quantifier, quantitative plural, quotative, R range numeral, realis, reciprocal pronominal, reciprocal voice, reduced definiteness, reflexive, register, relative, remote, S second person, set numeral, singular, singulare tantum, specific indefinite, subelative, subessive, subjunctive, sublative, superelative, superessive, superlative case, superlative degree, supine, T temporal, tense, terminal allative, terminative, third person, total, transgressive, translative, trial, typo, U uter, V verb form, verbal adjective, verbal adverb, verbal noun, vocative, voice, Z zero person
* The labels Nominal and Verbal are used as approximate categories only. There is no universal rule that a particular feature can only occur with verbs or nominals (although language-specific rules may define such constraints). Even the boundary between lexical and inflectional features is sometimes blurred: for example, gender is a lexical feature of nouns but an inflectional feature of adjectives or verbs.

Abbr: abbreviation

Values: Yes

Boolean feature. Is this an abbreviation? Note that the abbreviated word(s) typically belongs to a part of speech other than u-pos/X.

Note: This feature is new in UD version 2. It was used as a language-specific addition in several treebanks in version 1.

Yes: it is abbreviation

Examples

edit Abbr

AdpType: adposition type

Values: Circ Post Prep Voc

Prep: preposition

Examples

Post: postposition

Examples

Circ: circumposition

Examples

Voc: vocalized preposition

In Slavic languages, some prepositions are non-syllabic and their form has to be changed in some contexts to facilitate pronunciation.

Examples

Same phenomenon exists in Slovak, Russian and probably elsewhere.

edit AdpType

AdvType: adverb type

Semantic subclasses of adverbs. They are annotated in some tagsets (e.g. Bulgarian, Czech, Hindi, Japanese) and would probably apply to many other languages if their tagsets cared to cover them. Note that the “prontype” feature also applies to some adverbs and is orthogonal to “AdvType”.

Man: adverb of manner

Examples

Loc: adverb of location

Examples

Tim: adverb of time

Examples

Deg: adverb of quantity or degree

Note that there is a fuzzy borderline between adverbs of degree and indefinite numerals (as they are called in some grammars).

Examples

Cau: adverb of cause

Examples

Mod: adverb of modal nature

The Czech examples below are similar to modal verbs: they take infinitives as arguments and add the meaning of possibility, necessity or recommendedness. I suspect that the Bulgarian example (transliteration of French “à propos”) is used differently but its native tagset also calles it “modal”.

Examples

edit AdvType

Animacy: animacy

Values: Anim Hum Inan Nhum

Similarly to Gender (and to the African noun classes), animacy is usually a lexical feature of nouns and inflectional feature of other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns. Some languages distinguish only gender, some only animacy, and in some languages both gender and animacy play a role in the grammar. (Some non-UD tagsets then combine the two features into an extended system of genders; however, in UD the two features are annotated separately.)

Similarly to gender, the values of animacy refer to semantic properties of the noun, but this is only an approximation, referring to the prototypical members of the categroy. There are nouns that are treated as grammatically animate, although semantically the are inanimate.

The following table is an example of a three-way animacy distinction (human – animate nonhuman – inanimate) in the declension of the masculine determiner który “which” in Polish (boldface forms in the upper and lower rows differ from the middle row):

gender sg-nom sg-gen sg-dat sg-acc sg-ins sg-loc pl-nom pl-gen pl-dat pl-acc pl-ins pl-loc
animate human który którego któremu którego którym którym którzy których którym których którymi których
animate non-human który którego któremu którego którym którym które których którym które którymi których
inanimate który którego któremu który którym którym które których którym które którymi których

In the corresponding paradigm of Czech, only two values are distinguished: masculine animate and masculine inanimate:

gender sg-nom sg-gen sg-dat sg-acc sg-ins sg-loc pl-nom pl-gen pl-dat pl-acc pl-ins pl-loc
animate který kterého kterému kterého kterým kterém kteří kterých kterým které kterými kterých
inanimate který kterého kterému který kterým kterém které kterých kterým které kterými kterých

More generally: Some languages distinguish animate vs. inanimate (e.g. Czech masculines), some languages distinguish human vs. non-human (e.g. Yuwan, a Ryukyuan language), and others distinguish three values, human vs. non-human animate vs. inanimate (e.g. Polish masculines).

Anim: animate

Human beings, animals, fictional characters, names of professions etc. are normally animate. Even nouns that are normally inanimate can be inflected as animate if they are personified. And some words in some languages can grammatically behave like animates although there is no obvious semantic reason for that.

Examples

Inan: inanimate

Nouns that are not animate are inanimate.

Examples

Hum: human

A subset of animates where the prototypical member is a human being but not an animal. Again, there may be exceptions that do not fit the class semantically but belong to it grammatically.

Examples

Nhum: non-human

In languages that only distinguish human from non-human, this value includes inanimates. In languages that distinguish human animates, non-human animates and inanimates, this value is used only for non-human animates, while Inan is used for inanimates.

Examples

edit Animacy

Aspect: aspect

Values: Hab Imp Iter Perf Prog Prosp

Aspect is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.

Aspect is a feature that specifies duration of the action in time, whether the action has been completed etc. In some languages (e.g. English), some tenses are actually combinations of tense and aspect. In other languages (e.g. Czech), aspect and tense are separate, although not completely independent of each other.

In Czech and other Slavic languages, aspect is a lexical feature. Pairs of imperfective and perfective verbs exist and are often morphologically related but the space is highly irregular and the verbs are considered to belong to separate lemmas.

Since we proceed bottom-up, the current standard covers only a few aspect values found in corpora. See Wikipedia (http://en.wikipedia.org/wiki/Grammatical_aspect) for a long list of other possible aspects.

Imp: imperfect aspect

The action took / takes / will take some time span and there is no information whether and when it was / will be completed.

Examples

Perf: perfect aspect

The action has been / will have been completed. Since there is emphasis on one point on the time scale (the point of completion), this aspect does not work well with the present tense. For example, Czech morphology can create present forms of perfective verbs but these actually have a future meaning.

Examples

Prosp: prospective aspect

In general, prospective aspect can be described as relative future: the action is/was/will be expected to take place at a moment that follows the reference point; the reference point itself can be in past, present or future. In the English sentence When I got home yesterday, John called and said he would arrive soon, the last clause (he would arrive soon) is in prospective aspect. Nevertheless, English does not have overt affixal morphemes dedicated to the prospective aspect, and we do not need the label in English. But other languages do; the -ko suffix in Basque is an example.

Note that this value was called Pro in UD v1 and it has been renamed Prosp in UD v2.

Examples

Prog: progressive aspect

English progressive tenses (I am eating, I have been doing …) have this aspect. They are constructed analytically (auxiliary + present participle) but the -ing participle is so bound to progressive meaning that it seems a good idea to annotate it with this feature (we have to distinguish it from the past participle somehow; we may use both the “Tense” and the “Aspect” features).

In languages other than English, the progressive meaning may be expressed by morphemes bound to the main verb, which makes this value even more justified. Example is Turkish with its two distinct progressive morphemes, -yor and -mekte.

Examples

Hab: habitual aspect

The action takes place habitually (daily, weekly, annually etc) or is a usual occurrence.

Examples

Iter: iterative / frequentative aspect

Denotes repeated action. Attested e.g. in Hungarian. Iteratives also exist in Czech with this name but their meaning is rather habitual. They can be formed only from imperfective verbs and they are usually not classified as a separate aspect; they are just Aspect=Imp.

Note: This value is new in UD v2 but a similar value has been used in UD v1 as language-specific for Hungarian, though it was called frequentative there (Freq).

Examples

edit Aspect

Case: case

Values: Core: Abs Acc Erg Nom
Non-core: Abe Ben Cau Cmp Cns Com Dat Dis Equ Gen Ins Par Tem Tra Voc
Local: Abl Add Ade All Del Ela Ess Ill Ine Lat Loc Per Sbe Sbl Spl Sub Sup Ter

Case is usually an inflectional feature of nouns and, depending on language, other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns.

Case can also be a lexical feature of adpositions and describe the case meaning that the adposition contributes to the nominal in which it appears. (This usage of the feature is typical for languages that do not have case morphology on nouns. For languages that have both adpositions and morphological case, the traditional set of cases is determined by the nominal forms and it does not cover adpositional meanings.) In some non-UD tagsets, case of adpositions is used as a valency feature (saying that the adposition requires its nominal argument to be in that morphological case); however, annotating adposition valency case in UD treebanks would be superfluous because the same case feature can be found at the nominal to which the adposition belongs.

Case helps specify the role of the noun phrase in the sentence, especially in free-word-order languages. For example, the nominative and accusative cases often distinguish subject and object of the verb, while in fixed-word-order languages these functions would be distinguished merely by the positions of the nouns in the sentence.

Here on the level of morphosyntactic features we are dealing with case expressed morphologically, i.e. by bound morphemes (affixes). Note that on a higher level case can be understood more broadly as the role, and it can be also expressed by adding an adposition to the noun. What is expressed by affixes in one language can be expressed using adpositions in another language. Cf. the u-dep/case dependency label.

Examples

The descriptions of the individual case values below include semantic hints about the prototypical meaning of the case. Bear in mind that quite often a case will be used for a meaning that is totally unrelated to the meaning mentioned here. Valency of verbs, adpositions and other words will determine that the noun phrase must be in a particular grammatical case to fill a particular valency slot (semantic role). It is much the same as trying to explain the meaning of prepositions: most people would agree that the central meaning of English in is location in space or time but there are phrases where the meaning is less locational: In God we trust. Say it in English.

Note that Indian corpora based on the so-called Paninian model use a related feature called vibhakti. It is a merger of the Case feature described here and of various postpositions. Values of the feature are language-dependent because they are copies of the relevant morphemes (either bound morphemes or postpositions). Vibhakti can be mapped on the Case values described here if we know 1. which source values are bound morphemes (postpositions are separate nodes for us) and 2. what is their meaning. For instance, the genitive case (Gen) in Bengali is marked using the suffix -ra (-র), i.e. vib=era. In Hindi, the suffix has been split off the noun and it is now written as a separate word – the postposition kā/kī/ke (का/की/के). Even if the postpositional phrase can be understood as a genitive noun phrase, the noun is not in genitive. Instead, the postposition requires that it takes one of three case forms that are marked directly on the noun: the oblique case (Acc).

Nom: nominative / direct

The base form of the noun, typically used as citation form (lemma). In many languages this is the word form used for subjects of clauses. If the language has only two cases, which are called “direct” and “oblique”, the direct case will be marked Nom.

Examples

Acc: accusative / oblique

Perhaps the second most widely spread morphological case. In many languages this is the word form used for direct objects of verbs. If the language has only two cases, which are called “direct” and “oblique”, the oblique case will be marked Acc.

Examples

Abs: absolutive

Some languages (e.g. Basque) do not use nominative-accusative to distinguish subjects and objects. Instead, they use the contrast of absolutive-ergative.

The absolutive case marks subject of intransitive verb and direct object of transitive verb.

Examples

Erg: ergative

Some languages (e.g. Basque) do not use nominative-accusative to distinguish subjects and objects. Instead, they use the contrast of absolutive-ergative.

The ergative case marks subject of transitive verb.

Examples

Dat: dative

In many languages this is the word form used for indirect objects of verbs.

Examples

Gen: genitive

Prototypical meaning of genitive is that the noun phrase somehow belongs to its governor; it would often be translated by the English preposition of. English has the “saxon genitive” formed by the suffix ‘s; but we will normally not need the feature in English because the suffix gets separated from the noun during tokenization.

Note that despite considerable semantic overlap, the genitive case is not the same as the feature of possessivity (Poss). Possessivity is a lexical feature, i.e. it applies to lemma and its whole paradigm. Genitive is a feature of just a subset of word forms of the lemma. Semantics of possessivity is much more clearly defined while the genitive (as many other cases) may be required in situations that have nothing to do with possessing. For example, [cs] bez prezidentovy dcery “without the president’s daughter” is a prepositional phrase containing the preposition bez “without”, the possessive adjective prezidentovy “president’s” and the noun dcery “daughter”. The possessive adjective is derived from the noun prezident but it is really an adjective (with separate lemma and paradigm), not just a form of the noun. In addition, both the adjective and the noun are in their genitive forms (the nominative would be prezidentova dcera). There is nothing possessive about this particular occurrence of the genitive. It is there because the preposition bez always requires its argument to be in genitive.

Examples

Note that in Basque, Gen should be used for possessive genitive (as opposed to locative genitive): diktadorearen erregimena “dictator’s regime”; diktadore “dictator”.

Voc: vocative

The vocative case is a special form of noun used to address someone. Thus it predominantly appears with animate nouns (see the feature of Animacy). Nevertheless this is not a grammatical restriction and inanimate things can be addressed as well.

Examples

Ins: instrumental / instructive

The role from which the name of the instrumental case is derived is that the noun is used as instrument to do something (as in [cs] psát perem “to write using a pen”). Many other meanings are possible, e.g. in Czech the instrumental is required by the preposition s “with” and thus it includes the meaning expressed in other languages by the comitative case.

In Czech the instrumental is also used for the agent-object in passive constructions (cf. the English preposition by).

Examples

A semantically similar case called instructive is used rarely in Finnish to express “with (the aid of)”. It can be applied to infinitives that behave much like nouns in Finnish. We propose one label for both instrumental and instructive (instrumental is not defined in Finnish).

Examples

Par: partitive

In Finnish the partitive case expresses indefinite identity and unfinished actions without result.

Examples

Examples comparing partitive with accusative: ammuin karhun “I shot a bear.Acc” (and I know that it is dead); ammuin karhua “I shot at a bear.Par” (but I may have missed).

Using accusative instead of partitive may also substitute the missing future tense: luen kirjan “I will read the book.Acc”; luen kirjaa “I am reading the book.Par”.

Dis: distributive

The distributive case conveys that something happened to every member of a set, one in a time. Or it may express frequency.

Examples

Ess: essive / prolative

The essive case expresses a temporary state, often it corresponds to English “as a …” A similar case in Basque is called prolative and it should be tagged Ess too.

Examples

Tra: translative / factive

The translative case expresses a change of state (“it becomes X”, “it changes to X”). Also used for the phrase “in language X”. In the Szeged Treebank, this case is called factive.

Examples

Com: comitative / associative

The comitative (also called associative) case corresponds to English “together with …”

Examples

Abe: abessive / caritive / privative

The abessive case (also called caritive or privative) corresponds to the English preposition without.

Examples

Cau: causative / motivative / purposive

Noun in this case is the cause or purpose of something. In Hungarian it also seems to be used frequently with currency (“to buy something for the money”) and it also can mean the goal of something.

Examples

Ben: benefactive / destinative

The benefactive case corresponds to the English preposition for.

Examples

Cns: considerative

The considerative case denotes something that is given in exchange for something else. It is used in Warlpiri (Andrews 2007, p.164).

Examples

Cmp: comparative

The comparative case means “than X”. It marks the standard of comparison and it differs from the comparative Degree, which marks the property being compared. It occurs in Dravidian and Northeast-Caucasian languages.

Examples

Equ: equative

The equative case means “X-like”, “similar to X”, “same as X”. It marks the standard of comparison and it differs from the equative Degree, which marks the property being compared. It occurs in Turkish.

Examples

Location and direction

Loc: locative

The locative case often expresses location in space or time, which gave it its name. As elsewhere, non-locational meanings also exist and they are not rare. Uralic languages have a complex set of fine-grained locational and directional cases (see below) instead of the locative. Even in languages that have locative, some location roles may be expressed using other cases (e.g. because those cases are required by a preposition).

In Slavic languages this is the only case that is used exclusively in combination with prepositions (but such a restriction may not hold in other languages that have locative).

Examples

Lat: lative / directional allative

The lative case denotes movement towards/to/into/onto something. Similar case in Basque is called directional allative (Spanish adlativo direccional). However, lative is typically thought of as a union of allative, illative and sublative, while in Basque it is derived from allative, which also exists independently.

Examples

Ibarretxe-Antuñano (2004: 282) says about directional and terminal allative in Basque: “What crucially distinguishes these two cases from the allative is that, on top of profiling the goal, they also profile the path, or to be more precise, some of the components of the path.”

Ter: terminative / terminal allative

The terminative case specifies where something ends in space or time. Similar case in Basque is called terminal allative (Spanish adlativo terminal). While the lative (or directional allative) specifies only the general direction, the terminative (terminal allative) also says that the destination is reached.

Examples

Internal location

Ine: inessive

The inessive case expresses location inside of something.

Examples

Ill: illative / inlative

The illative case expresses direction into something.

Examples

Ela: elative / inelative

The elative case expresses direction out of something.

Examples

Add: additive

Distinguished by some scholars in Estonian, not recognized by traditional grammar, exists in the Multext-East Estonian tagset and in the Eesti keele puudepank. It has the meaning of illative, and some grammars will thus consider the additive just an alternative form of illative. Forms of this case exist only in singular and not for all nouns.

Examples

External location

Ade: adessive

The adessive case expresses location at, on the surface, or near something. The corresponding directional cases are allative (towards something) and ablative (from something).

Examples

Note that adessive is used to express location on the surface of something in Finnish and Estonian, but does not carry this meaning in Hungarian.

All: allative / adlative

The allative case expresses direction to something (destination is adessive, i.e. at or on that something).

Examples

Abl: ablative / adelative

Prototypical meaning: direction from some point. In systems that distinguish different source locatins (e.g. in Uralic languages), this case corresponds to the “adelative”, that is, the source is adessive.

Examples

Higher location

Sup: superessive

Used to express location higher than a reference point (atop something or above something). Attested in Nakh-Dagestanian languages and also in Hungarian (while other Uralic languages express this location with the adessive case, Hungarian has both adessive and superessive).

Examples

Spl: superlative

The superlative case is used in Nakh-Dagestanian languages to express the destination of movement, originally to the top of something, and, by extension, in other figurative meanings as well.

Note that Hungarian assigns this meaning to the sublative case, which otherwise indicates that the destination is below (not above) something.

Examples

Del: delative / superelative

Used in Hungarian and in Nakh-Dagestanian languages to express the movement from the surface of something (like “moved off the table”).

Other meanings are possible as well, e.g. “about something”.

Examples

Lower location

Sub: subessive

Used to express location lower than a reference point (under something or below something). Attested in Nakh-Dagestanian languages.

Examples

Sbl: sublative

The original meaning of the sublative case is movement towards a place under or lower than something, that is, the destination is subessive. It is attested in Nakh-Dagestanian languages. Note however that like many other cases, it is now used in abstract senses that are not apparently connected to the spatial meaning: for example, in Lezgian it may indicate the cause of something.

Hungarian uses the sublative label for what would be better categorized as superlative, as it expresses the movement to the surface of something (e.g. “to climb a tree”), and, by extension, other figurative meanings as well (e.g. “to university”).

Examples

Sbe: subelative

Used to express movement or direction from under something.

Examples

Per: perlative

The perlative case denotes movement along something. It is used in Warlpiri (Andrews 2007, p.162). Note that Unimorph mentions the English preposition “along” in connection with what they call prolative/translative; but we have different definitions of those two cases.

Examples

Tem: temporal

The temporal case is used to indicate time.

Examples

References

edit Case

Clusivity: clusivity

Values: Ex In

Clusivity is a feature of first-person plural personal pronouns. As such, it can also be reflected by inflection of verbs, e.g. in Plains Cree (Wolvengrey 2011 p. 66).

In: inclusive

Includes the listener, i.e. we = I + you (+ optionally they).

Examples

Ex: exclusive

Excludes the listener, i.e. we = I + they.

Examples

References

edit Clusivity

Clusivity[obj]: clusivity agreement with object

Values: Ex In

Clusivity is a feature of first-person plural personal pronouns. As such, it can also be reflected by inflection of verbs, e.g. in Mbyá Guaraní.

Some languages are head-marking, which means that the verbal morphology can cross-reference multiple core arguments, not just the subject. If the cross-reference involves the Clusivity of the argument, we have two layers of Clusivity on the verb: Clusivity[subj], and (for transitive verbs) Clusivity[obj]. While it would be possible to make the subject layer the default and use just Clusivity for it, the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.

In: inclusive object

Includes the listener, i.e. we = I + you (+ optionally they).

Examples

Ex: exclusive object

Excludes the listener, i.e. we = I + they.

Examples

edit Clusivity[obj]

Clusivity[psor]: possessor’s clusivity

Values: Ex In

Clusivity is a feature of first-person plural personal pronouns. Clusivity[psor] is possessor’s clusivity, marked e.g. on nouns in Mbyá Guaraní. These noun forms would be translated to English as possessive pronoun + noun.

This layered feature is conveniently used for possessive inflections of nouns, although nouns normally do not have a Clusivity feature, meaning that no other layers are needed. Nevertheless, the possessive morphology typically also includes Number, which could be confused with the number of the noun, and we thus have Person[psor] together with Number[psor].

This layered feature is normally not used with possessive pronouns. They traditionally have just simple Clusivity. (And in some languages, possessive pronouns are actually identical to personal pronouns in the genitive case.)

In: inclusive possessor

Includes the listener, i.e. we = I + you (+ optionally they).

Examples

Ex: exclusive possessor

Excludes the listener, i.e. we = I + they.

Examples

edit Clusivity[psor]

Clusivity[subj]: clusivity agreement with subject

Values: Ex In

Clusivity is a feature of first-person plural personal pronouns. As such, it can also be reflected by inflection of verbs, e.g. in Mbyá Guaraní.

Some languages are head-marking, which means that the verbal morphology can cross-reference multiple core arguments, not just the subject. If the cross-reference involves the Clusivity of the argument, we have two layers of Clusivity on the verb: Clusivity[subj], and (for transitive verbs) Clusivity[obj]. While it would be possible to make the subject layer the default and use just Clusivity for it, the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.

In: inclusive subject

Includes the listener, i.e. we = I + you (+ optionally they).

Examples

Ex: exclusive subject

Excludes the listener, i.e. we = I + they.

Examples

edit Clusivity[subj]

ConjType: conjunction type

Values: Comp Oper Pred

We already distinguished the two main types, coordinating and subordinating conjunctions, at the level of POS tags. However, there are other subtypes that are not yet accounted for.

Comp: comparing conjunction

Examples: [de] wie (as), als (than)

Oper: mathematical operator

Note that operators can be expressed either using symbols or using words.

Examples: [cs] krát (times), plus, minus

Pred: subordinating conjunction introducing a secondary predicate

Examples: [pl] jako (as)

edit ConjType

Definite: definiteness or state

Values: Com Cons Def Ind Spec

Definiteness is typically a feature of nouns, adjectives and articles. Its value distinguishes whether we are talking about something known and concrete, or something general or unknown. It can be marked on definite and indefinite articles, or directly on nouns, adjectives etc. In Arabic, definiteness is also called the “state”.

Ind: indefinite

In languages where Spec is distinguished the value Ind is interpreted as non-specific indefinite, i.e. “any (one) stick”.

Examples

Spec: specific indefinite

Specific indefinite, e.g. “a certain stick”. Occurs e.g. in Lakota. In languages where it is used the value Ind is interpreted as non-specific indefinite, i.e. “any (one) stick”.

Examples

Def: definite

Examples

Cons: construct state / reduced definiteness

Used in construct state in Arabic. If two nouns are in genitive relation, the first one (the “nomen regens”) has “reduced definiteness,” the second is the genitive and can be either definite or indefinite. Reduced form has neither the definite morpheme (article), nor the indefinite morpheme (nunation).

Note that in UD v1 this value was called Red. It has been renamed Cons in UD v2.

Examples

Com: complex

Used in improper annexation in Arabic. The genitive construction described above normally consists of two nouns (first reduced, second genitive). That is called proper annexation or iḍāfa. If the first member is an adjective or adjectivally used participle and the second member is a definite noun, the construction is called improper annexation or false iḍāfa. The result is a compound adjective that is usually used as an attributive adjunct and thus must agree in definiteness with the noun it modifies. Its first part (the adjective or participle) may get again the definite article. Although it may look the same as the form for the definite state, it is assigned a special value of complex state to reflect the different origin. See also Hajič et al. page 3.

Examples:

edit Definite

Degree: degree

Values: Abs Aug Cmp Dim Equ Pos Sup

Degree of comparison is typically an inflectional feature of some adjectives and adverbs. A different flavor of degree is diminutives and augmentatives, which often apply to nouns but are not restricted to them.

Pos: positive, first degree

This is the base form that merely states a quality of something, without comparing it to qualities of others. Note that although this degree is traditionally called “positive”, negative properties can be compared, too.

Examples

Equ: equative

The quality of one object is compared to the same quality of another object, and the result is that they are identical or similar (“as X as”). Note that it marks the adjective and it is distinct from the equative Case, which marks the standard of comparison.

Examples

Cmp: comparative, second degree

The quality of one object is compared to the same quality of another object.

Examples

Sup: superlative, third degree

The quality of one object is compared to the same quality of all other objects within a set.

Examples

Abs: absolute superlative

Some languages can express morphologically that the studied quality of the given object is so strong that there is hardly any other object exceeding it. The quality is not actually compared to any particular set of objects.

Examples

Dim: diminutive

Morphologically derived form of a noun that indicates small size, or, metaphorically, affection towards the entity described by the noun. While nouns are the prototypical category in which diminutives are formed, the feature is not restricted to nouns and in some languages similar morphology can be observed with other categories (adjectives, verbs).

Examples

Aug: augmentative

Morphologically derived form of a noun that indicates large size or force. While nouns are the prototypical category in which augmentatives are formed, the feature is not restricted to nouns and in some languages similar morphology can be observed with other categories (adjectives, verbs).

Examples

edit Degree

Deixis: relative location encoded in demonstratives

Values: Abv Bel Even Med Nvis Prox Remt

Deixis is typically a feature of demonstrative pronouns, determiners, and adverbs. Its value classifies the location of the referred entity with respect to the location of the speaker or of the hearer. The common distinction is distance (proximate vs. remote entities); in some languages, elevation is distinguished as well (e.g., the entity is located higher or lower than the speaker).

If it is necessary to distinguish the person whose location is the reference point (speaker or hearer), the feature DeixisRef can be used in addition to Deixis. See also the Wolof examples below. DeixisRef is not needed if all deictic expressions in the language are relative to the same person (probably the speaker).

Prox: proximate

The entity is close to the reference point (e.g., to the speaker).

Examples

Med: medial

The entity is neither close nor far away from the reference point (e.g., from the speaker).

Examples

Remt: remote, distal

The entity is far away from the reference point (e.g., from the speaker).

Examples

Nvis: not visible

The entity is remote and not visible. In Khasi, where this distinction is made, the Remt value can be used to annotate “remote but visible”.

Examples

Abv: above the reference point

Occurs e.g. in Aghul [agx], Lak [lbe], and Khasi [kha]. The entity is both remote from the speaker and above them.

Examples

Even: at the same level as the reference point

Occurs e.g. in Lak [lbe]. The entity is both remote and at the same level as the speaker.

Examples

Bel: below the reference point

Occurs e.g. in Aghul [agx] and Khasi [kha]. The entity is both remote from the speaker and below them.

Examples

edit Deixis

DeixisRef: person to which deixis is relative

Values: 1 2

DeixisRef is a feature of demonstrative pronouns, determiners, and adverbs, accompanying Deixis when necessary. Deixis encodes position of an entity relative to either the speaker or the hearer. If it is necessary to distinguish the person whose location is the reference point (speaker or hearer), DeixisRef is used. DeixisRef is not needed if all deictic expressions in the language are relative to the same person (probably the speaker), or if they do not distinguish the reference point.

1: deixis relative to the first person participant (speaker)

Examples

2: deixis relative to the second person participant (hearer)

Examples

edit DeixisRef

Echo: is this an echo word or a reduplicative?

Is this a reduplicative or echo word? Such words occur in Hindi and other Indian languages. In Hyderabad Dependency Treebank they get their own part-of-speech tags RDP and ECH, respectively. We do not want to treat them as separate parts of speech because they could be assigned a POS independent of their RDP or ECH status (same as the word that they echo). Perhaps we should merge this also with the “hyph” feature to something called “compound”?

Rdp: reduplicative

The word is a copy of a previous word. In Hindi, this would add the meaning of distribution (“one rupee each”), separation (“sit separately”), variety, diversity or just emphasis.

Examples: [hi] “कभी - कभी” = “kabhī - kabhī” = “sometimes”, “कभी” = “kabhī” = “sometimes”; “एक एक” = “eka eka” = “one each”, “एक” = “eka” = “one”

Ech: echo

The word rhymes with a previous word but it is not identical to it and typically it does not have any meaning of its own. In Hindi it generalizes the meaning of the previous word and eventually translates as “or something”, “etc.” etc.

Examples: [hi] “चाय वाय” = “čāya vāya” = “tea or something” (as in “Have some tea or something.”)

For more details see Rupert Snell and Simon Weightman: Teach Yourself Hindi, Section 16.4 and 16.5, pages 210 – 211.

edit Echo

Evident: evidentiality

Values: Fh Nfh

Evidentiality is the morphological marking of a speaker’s source of information (Aikhenvald, 2004). It is sometimes viewed as a category of mood and modality.

Many different values are attested in the world’s languages. At present we only cover the firsthand vs. non-firsthand distinction, needed in Turkish. It distinguishes there the normal past tense (firsthand, also definite past tense, seen past tense) from the so-called miş-past (non-firsthand, renarrative, indefinite, heard past tense).

Aikhenvald also distinguishes reported evidentiality, occurring in Estonian and Latvian, among others. We currently use the quotative Mood for this.

Note: Evident is a new universal feature in UD version 2. It was used as a language-specific feature (under the name Evidentiality) in UD v1 for Turkish.

Fh: firsthand

Examples

Nfh: non-firsthand

Examples

References

edit Evident

ExtPos: external part of speech

Values: ADJ ADP ADV CCONJ DET INTJ PRON PROPN SCONJ

This feature differs significantly from all other features: It describes neither the lexical category, nor the inflectional paradigm slot of the token it appears on. Rather than to the individual token, it pertains to a multiword expression and indicates the part of speech that the expression would get if it were analyzed as a single word. ExtPos is annotated at the head node of the multiword expression. The possible values are taken from the defined UPOS tags and no other values are allowed (not even at the language-specific level). The main motivation for ExtPos is that the multiword expression may behave like a part of speech different from the UPOS of the head node; however, ExtPos is sometimes used even if it is identical to the UPOS of the head node. Also, it is not strictly necessary that the expression is multiword – if one of the words of the expression is omitted by mistake, or if a single word has been coerced into a part of speech different from its lexical one, ExtPos may be used to signal it.

ExtPos is strongly recommended for fixed functional multiword expressions (the head node has one or more children attached via the fixed relation). These should normally lead to ExtPos values ADP, ADV, CCONJ, DET, PRON, SCONJ (the fixed relation should not be used for compounds that work like content words). However, ExtPos is occasionally useful in other situations, too: for example, when a multiword expression acts as a proper noun (although its parts behave like other words) or as an interjection.

ADJ: adjective-like expression

Examples

ADP: adposition-like expression

Multiword adpositions occur in many languages. Often they are grammaticalized prepositional phrases.

Examples

ADV: adverb-like expression

Examples

CCONJ: coordinating conjunction-like expression

Examples

DET: determiner-like expression

Examples

INTJ: interjection-like expression

Examples

PRON: pronoun-like expression

Examples

PROPN: proper noun-like expression

Examples

SCONJ: subordinator-like expression

Examples

edit ExtPos

Foreign: is this a foreign word?

Values: Yes

Boolean feature. Is this a foreign word? Not a loan word and not a foreign name but a genuinely foreign word appearing inside native text, e.g. inside direct speech, titles of books etc. This feature would apply either to the u-pos/X part of speech (unanalyzable token), or to other parts of speech if we know and are willing to annotate the class to which the word belongs in its original language.

See discussion at Foreign Expressions and Code-Switching.

Historical Note: This feature is new in UD version 2. It was used as a language-specific addition in several treebanks in version 1 but it was not considered boolean and three values were foreseen. Since the additional values were used extremely rarely, they are not part of the universal definition of this feature in UD v2.

Yes: it is foreign

Example: [en] He said I could “dra åt helvete!

edit Foreign

Gender: gender

Values: Com Fem Masc Neut

Gender is usually a lexical feature of nouns and inflectional feature of other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns. In English gender affects only the choice of the personal pronoun (he / she / it) and the feature is usually not encoded in English tagsets.

See also the related feature of Animacy.

African languages have an analogous feature of noun classes: there might be separate grammatical categories for flat objects, long thin objects etc.

Masc: masculine gender

Nouns denoting male persons are masculine. Other nouns may be also grammatically masculine, without any relation to sex.

Examples

Fem: feminine gender

Nouns denoting female persons are feminine. Other nouns may be also grammatically feminine, without any relation to sex.

Examples

Neut: neuter gender

Some languages have only the masculine/feminine distinction while others also have this third gender for nouns that are neither masculine nor feminine (grammatically).

Examples

Com: common gender

Some languages do not distinguish masculine/feminine most of the time but they do distinguish neuter vs. non-neuter (Swedish neutrum / utrum). The non-neuter is called common gender.

Note that it could also be expressed as a combined value Gender=Fem,Masc. Nevertheless we keep Com also as a separate value. Combined feature values should only be used in exceptional, undecided cases, not for something that occurs systematically in the grammar. Language-specific extensions to these guidelines should determine whether the Com value is appropriate for a particular language.

Note further that the Com value is not intended for cases where we just cannot derive the gender from the word itself (without seeing the context), while the language actually distinguishes Masc and Fem. For example, in Spanish, nouns distinguish two genders, masculine and feminine, and every noun can be classified as either Masc or Fem. Adjectives are supposed to agree with nouns in gender (and number), which they typically achieve by alternating -o / -a. But then there are adjectives such as grande or feliz that have only one form for both genders. So we cannot tell whether they are masculine or feminine unless we see the context. Yet they are either masculine or feminine (feminine in una ciudad grande, masculine in un puerto grande). Therefore in Spanish we should not tag grande with Gender=Com. Instead, we should either drop the gender feature entirely (suggesting that this word does not inflect for gender) or tag individual instances of grande as either masculine or feminine, depending on context.

Examples

edit Gender

Gender[dat]: gender agreement with the dative argument

Gender[dat]

Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

In the informal register, there are also separate forms for masculine and feminine arguments, although gender is otherwise not distinguished in Basque.

Masc: masculine dative argument

Examples

Fem: feminine dative argument

Examples

edit Gender[dat]

Gender[erg]: gender agreement with the ergative argument

Gender[erg]

Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

In the informal register, there are also separate forms for masculine and feminine arguments, although gender is otherwise not distinguished in Basque.

Masc: masculine ergative argument

Examples

Fem: feminine dative argument

Examples

edit Gender[erg]

Gender[obj]: gender agreement with object

Gender[obj]

Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments, not just the subject. If the cross-reference involves the Gender of the argument, we have two layers of Gender on the verb: Gender[subj], and (for transitive verbs) Gender[obj].

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

In the informal register, there are also separate forms for masculine and feminine arguments, although gender is otherwise not distinguished in Basque.

Masc: masculine object

Examples: [eu] ukan ezak “have it” Gender[erg]=Masc|Number[abs]=Sing|Number[erg]=Sing|Person[abs]=3|Person[erg]=2|Polite[erg]=Inf (imperative addressing a man)

Fem: feminine object

Examples: [eu] ukan ezan “have it” Gender[erg]=Fem|Number[abs]=Sing|Number[erg]=Sing|Person[abs]=3|Person[erg]=2|Polite[erg]=Inf (imperative addressing a woman)

edit Gender[obj]

Gender[psor]: possessor’s gender

Values: Fem Masc Neut

Possessive adjectives and pronouns may have two different genders: that of the possessed object (gender agreement with modified noun) and that of the possessor (lexical feature, inherent gender). The Gender[psor] feature captures the possessor’s gender.

In the Czech examples below, the masculine Gender[psor] implies using one of the suffixes -ův, -ova, -ovo, and the feminine Gender[psor] implies using one of -in, -ina, -ino.

Masc: masculine possessor

Examples

Fem: feminine possessor

Examples

Neut: neuter possessor

Examples

edit Gender[psor]

Gender[subj]: gender agreement with subject

Gender[subj]

Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments, not just the subject. If the cross-reference involves the Gender of the argument, we have two layers of Gender on the verb: Gender[subj], and (for transitive verbs) Gender[obj]. While it would be possible to make the subject layer the default and use just Gender for it, the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

In the informal register, there are also separate forms for masculine and feminine arguments, although gender is otherwise not distinguished in Basque.

Masc: masculine subject

Examples: [eu] ukan ezak “have it” Gender[erg]=Masc|Number[abs]=Sing|Number[erg]=Sing|Person[abs]=3|Person[erg]=2|Polite[erg]=Inf (imperative addressing a man)

Fem: feminine subject

Examples: [eu] ukan ezan “have it” Gender[erg]=Fem|Number[abs]=Sing|Number[erg]=Sing|Person[abs]=3|Person[erg]=2|Polite[erg]=Inf (imperative addressing a woman)

edit Gender[subj]

Hyph: hyphenated compound or part of it

Boolean feature. Is this part of a hyphenated compound? Depending on tokenization, the compound may be one token or be split to several tokens; then the tokens need tags.

These are words corresponding to prefixes such inter- (inter disciplinary), post- (post traumatic), un- (un avoidable), di- (di transitive) and so on in English, but which are relized as distinct tokens (without the hyphen) in different languages.

Yes: it is part of hyphenated compound

Note that this depends on the tokenization conventions used in the language. For example, in Czech (see below), česko-slovenský is tokenized as three tokens: česko, the hyphen, and slovenský. While slovenský is a normal adjective in Czech, česko is derived from an adjectival stem but it is in a form that can never occur as a separate word. On the other hand, it can be combined with many other adjectives denoting affiliation with a country or region: česko-moravský, česko-německý, česko-americký etc. If tokenization left it as one token, it the whole word česko-slovenský would be simply an adjective and no Hyph=Yes would be used in the annotation.

Examples

edit Hyph

Mood: mood

Values: Adm Cnd Des Imp Ind Int Irr Jus Nec Opt Pot Prp Qot Sub

Mood is a feature that expresses modality and subclassifies finite verb forms.

Ind: indicative or realis

The indicative can be considered the default mood. A verb in indicative merely states that something happens, has happened or will happen, without adding any attitude of the speaker.

Examples

Imp: imperative

The speaker uses imperative to order or ask the addressee to do the action of the verb.

Examples

Cnd: conditional

The conditional mood is used to express actions that would have taken place under some circumstances but they actually did not / do not happen. Grammars of some languages may classify conditional as tense (rather than mood) but e.g. in Czech it combines with two different tenses (past and present).

Examples

Pot: potential

The action of the verb is possible but not certain. This mood corresponds to the modal verbs can, might, be able to. Used e.g. in Finnish. See also the optative.

Examples

Sub: subjunctive / conjunctive

The subjunctive mood is used under certain circumstances in subordinate clauses, typically for actions that are subjective or otherwise uncertain. In German, it may be also used to convey the conditional meaning.

Examples

Jus: jussive / injunctive

The jussive mood expresses the desire that the action happens; it is thus close to both imperative and optative. Unlike in desiderative, it is the speaker, not the subject who wishes that it happens. Used e.g. in Arabic. We also map the Sanskrit injunctive to Mood=Jus.

Examples

Prp: purposive

Means “in order to”, occurs in Amazonian and Australian languages, such as Arabana.

Examples

Qot: quotative

The quotative mood is used e.g. in Estonian to denote direct speech. The boundary between this mood and the non-first-hand Evidentiality is blurred.

Examples

Opt: optative

Expresses exclamations like “May you have a long life!” or “If only I were rich!” In Turkish it also expresses suggestions. In Sanskrit it may express possibility (cf. the potential mood in other languages).

Examples

Des: desiderative

The desiderative mood corresponds to the modal verb “want to”: “He wants to come.” Used e.g. in Japanese or Turkish.

Examples

Nec: necessitative

The necessitative mood expresses necessity and corresponds to the modal verbs “must, should, have to”: “He must come.”

Examples

Int: interrogative

Verbs in some languages have a special interrogative form that is used in yes-no questions. This is attested, for instance, in the Turkic languages. Celtic languages have it for the copula but not for normal verbs.

Examples

Irr: irrealis

The irrealis mood denotes an action that is not known to have happened. As such, it is a roof term for a group of more specific moods such as conditional, potential, or desiderative. Some languages do not distinguish these finer shades of meaning but they do distinguish realis (which we tag with the same feature as indicative, Ind) and irrealis.

Examples

Adm: admirative

Expresses surprise, irony or doubt. Occurs in Albanian, other Balkan languages, and in Caddo (Native American from Oklahoma).

Examples

edit Mood

NameType: type of named entity

Values: Com Geo Giv Nat Oth Pat Pro Prs Sur

Classification of named entities (token-based, no nesting of entities etc.) The feature applies mainly to the PROPN tag; in multi-word foreign names, adjectives may also have this feature (they preserve the ADJ tag but at the same time they would not exist in the host language otherwise than in the named entity).

Geo: geographical name

Names of cities, countries, rivers, mountains etc.

Examples

Prs: name of person

This value is used if it is not known whether it is a given or a family name, but it is known that it is a personal name.

Examples

Giv: given name of person

Given name (not family name). This is usually the first name in European and American names. In Chinese names, the last two syllables (of three) are usually the given name.

Examples

Pat: patronymic in a name of a person

Patronymic (not given name and not family name). This is the middle name in East Slavic personal names.

Examples

Sur: surname / family name of person

Family name (surname). This is usually the last name in European and American names. In Chinese names, the first syllable (of three) is usually the surname.

Examples

Nat: nationality

Name denoting a member of a particular nation, or inhabitant of a particular territory.

Examples

Com: company, organization

Examples

Pro: product

Examples

Oth: other

Names of stadiums, guerilla bases, events etc.

Examples

edit NameType

NounClass: noun class

Values: Bantu1 Bantu2 Bantu3 Bantu4 Bantu5 Bantu6 Bantu7 Bantu8 Bantu9 Bantu10
Bantu11 Bantu12 Bantu13 Bantu14 Bantu15 Bantu16 Bantu17 Bantu18 Bantu19 Bantu20
Bantu21 Bantu22 Bantu23
Wol1 Wol2 Wol3 Wol4 Wol5 Wol6 Wol7 Wol8 Wol9 Wol10
Wol11 Wol12

NounClass is similar to Gender and Animacy because it is to a large part a lexical category of nouns and other parts of speech inflect for it to show agreement (pronouns, adjectives, determiners, numerals, verbs).

The distinction between gender and noun class is not sharp and is partially conditioned by the traditional terminology of a given language family. In general, the feature is called gender if the number of possible values is relatively low (typically 2-4) and the partition correlates with sex of people and animals. In language families where the number of categories is high (10-20), the feature is usually called noun class. No language family uses both the features.

In Bantu languages, the noun class also encodes Number; therefore it is a lexical-inflectional feature of nouns. The words should be annotated with the Number feature in addition to NounClass, despite the fact that people who know Bantu could infer the number from the noun class. The lemma of the noun should be its singular form.

The set of values of this feature is specific for a language family or group. Within the group, it is possible to identify classes that have similar meaning across languages (although some classes may have merged or disappeared in some languages in the group). The value of the NounClass feature consists of a short identifier of the language group (e.g., Bantu), and the number of the class (there is a standardized class numbering system accepted by scholars of the various Bantu languages; similar numbering systems should be created for the other families that have noun classes).

List of noun classes in Swahili

(from https://en.wikipedia.org/wiki/Noun_class)

Class number Prefix Typical meaning
1 m-, mw-, mu- singular: persons
2 wa-, w- plural: persons (a plural counterpart of class 1)
3 m-, mw-, mu- singular: plants
4 mi-, my- plural: plants (a plural counterpart of class 3)
5 ji-, j-, Ø- singular: fruits
6 ma-, m- plural: fruits (a plural counterpart of class 5, 9, 11, seldom 1)
7 ki-, ch- singular: things
8 vi-, vy- plural: things (a plural counterpart of class 7)
9 n-, ny-, m-, Ø- singular: animals, things
10 n-, ny-, m-, Ø- plural: animals, things (a plural counterpart of class 9 and 11)
11 u-, w-, uw- singular: no clear semantics
15 ku-, kw- verbal nouns
16 pa- locative meanings: close to something
17 ku- indefinite locative or directive meaning
18 mu-, m- locative meanings: inside something

Bantu1: singular, persons

The corresponding plural class is Bantu2.

Examples

Bantu2: plural, persons

The corresponding singular class is Bantu1.

Examples

Bantu3: singular, plants, thin objects

The corresponding plural class is Bantu4.

Examples

Bantu4: plural, plants, thin objects

The corresponding singular class is Bantu3.

Examples

Bantu5: singular, fruits, round objects, paired things

The corresponding plural class is Bantu6.

Examples

Bantu6: plural, fruits, round objects, paired things

The corresponding singular class is Bantu5, also Bantu9, Bantu11, seldomly Bantu1.

Examples

Bantu7: singular, things, diminutives

The corresponding plural class is Bantu8.

Examples

Bantu8: plural, things, diminutives

The corresponding singular class is Bantu7.

Examples

Bantu9: singular, animals, things

The corresponding plural class is Bantu10 or Bantu6.

Examples

Bantu10: plural, animals, things

The corresponding singular class is Bantu9.

Examples

Bantu11: long thin objects, natural phenomena, abstracts

Examples

Bantu12: singular, small things, diminutives

The corresponding plural class is Bantu13 or Bantu14.

Examples

Bantu13: plural or mass, small amount of mass

Examples

Bantu14: plural, diminutives

In Ganda, this is the plural counterpart of Bantu12.

Examples

Bantu15: verbal nouns, infinitives

Examples

Bantu16: definite location, close to something

Examples

Bantu17: indefinite location, direction, movement

Examples

Bantu18: definite location, inside something

Examples

Bantu19: little bit of, pejorative plural

Bantu class 19 may signify “a little bit of” or a plural with a pejorative nuance, as in Hunde.

Examples

Bantu20: singular, augmentatives

In Ganda, the corresponding plural class is Bantu6 or Bantu22.

Examples

Bantu21: singular, augmentatives, derogatives

Examples

Bantu22: plural, augmentatives

The corresponding singular class is Bantu20.

Examples

Bantu23: location with place names

Examples

Noun Classes in Wolof

Wolof is a non-Bantu Niger-Congo language. It has noun classes but their semantics cannot be easily mapped on the Bantu classes. The class is morphologically unmarked on nouns (although it is an inherent property of the lexeme) but determiners have to show agreement with the class.

The Wolof noun class system lacks semantic coherence. One reason for this is that in Wolof noun classification is sometimes based on other factors than semantics, including phonology and morphology. And still these are just some tendencies, but in most cases there is no clear semantics, phonology or morphology that can explain the classification in Wolof.

Examples

The following table shows the forms of proximate demonstratives in the first ten noun classes; classes 2 and 8 are plural, the rest are singular.

Wol1Wol2Wol3Wol4Wol5Wol6Wol7Wol8Wol9Wol10English
kigijibimilisiwi“this”
ñiyi“these”

Wolof classes 11 and 12, although behaving like noun classes, have meanings that are adverbial rather than nominal: class 11 is for location, class 12 for manner.

Wol11Wol12
fi “here”ni “so”

Wol1: Wolof noun class 1/k (singular human)

Examples

Wol2: Wolof noun class 2/ñ (plural human)

Examples

Wol3: Wolof noun class 3/g (singular)

Examples

Wol4: Wolof noun class 4/j (singular)

Examples

Wol5: Wolof noun class 5/b (singular)

For example, “dog” is in the b class.

Examples

Wol6: Wolof noun class 6/m (singular)

For example, “sheep” is in the m class.

Examples

Wol7: Wolof noun class 7/l (singular)

Examples

Wol8: Wolof noun class 8/y (plural non-human)

Examples

Wol9: Wolof noun class 9/s (singular)

Examples

Wol10: Wolof noun class 10/w (singular)

Examples

Wol11: Wolof noun class 11/f (location)

Examples

Wol12: Wolof noun class 12/n (manner)

Examples

edit NounClass

NounType: noun type

Values: Clf

We already split common and proper nouns at the level of UPOS tags but some tagsets mark other distinctions.

Clf: classifier

Chinese classifiers between cardinal numbers and nouns, or between determiners and nouns.

Examples

edit NounType

NumForm: numeral form

Values: Combi Digit Roman Word

Feature of cardinal and ordinal numbers. Is the number expressed by digits or as a word? This feature appears in a number of tagsets. Note that it is currently a bit Euro-centric because it distinguishes (Euro)Arabic digits and Roman numerals, but what about digits in various other scripts? In texts in many Indian scripts and in the Arabic script both native digits and Euro-Arabic digits can appear (e.g. 2014 vs. २०१४ in Devanagari).

Word: number expressed as word

Examples: one, two, three

Digit: number expressed using digits

Examples: 1, 2, 3

Combi: digits combined with a suffix

Examples: [lt] 15-oji (15th)

Roman: roman numeral

Examples: I, II, III

edit NumForm

NumType: numeral type

Values: Card Dist Frac Mult Ord Range Sets

Some languages (especially Slavic) have a complex system of numerals. For example, in the school grammar of Czech, the main part of speech is “numeral”, it includes almost everything where counting is involved and there are various subtypes. It also includes interrogative, relative, indefinite and demonstrative words referring to numbers (words like kolik / how many, tolik / so many, několik / some, a few), so at the same time we may have a non-empty value of PronType. (In English, these words are called quantifiers and they are considered a subgroup of determiners.)

From the syntactic point of view, some numtypes behave like adjectives and some behave like adverbs. We tag them u-pos/ADJ and u-pos/ADV respectively. Thus the NumType feature applies to several different parts of speech:

Card: cardinal number or corresponding interrogative / relative / indefinite / demonstrative word

Note that in some Indo-European languages there is a fuzzy borderline between numerals and nouns for thousand, million and billion.

Examples

Ord: ordinal number or corresponding interrogative / relative / indefinite / demonstrative word

This is a subtype of adjective or (in some languages) of adverb.

Examples

Mult: multiplicative numeral or corresponding interrogative / relative / indefinite / demonstrative word

This is subtype of adjective or adverb.

Examples

Frac: fraction

This is a subtype of cardinal numbers, occasionally distinguished in corpora. It may denote a fraction or just the denominator of the fraction. In various languages these words may behave morphologically and syntactically as nouns or ordinal numerals.

Examples

Sets: number of sets of things; collective numeral

Morphologically distinct class of numerals used to count sets of things, or nouns that are pluralia tantum. Some authors call this type collective numeral.

Examples

Dist: distributive numeral

Used to express that the same quantity is distributed to each member in a set of targets.

Examples

Range: range of values

This could be considered a subtype of cardinal numbers, occasionally distinguished in corpora.

Examples

edit NumType

Number: number

Values: Coll Count Dual Grpa Grpl Inv Pauc Plur Ptan Sing Tri

Number is usually an inflectional feature of nouns and, depending on language, other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns.

In languages where noun phrases are pluralized using a specific function word (pluralizer), this function word is tagged DET and Number=Plur is its lexical feature.

Sing: singular number

A singular noun denotes one person, animal or thing.

Examples

Plur: plural number

A plural noun denotes several persons, animals or things.

Examples

Dual: dual number

A dual noun denotes two persons, animals or things.

Examples

Tri: trial number

A trial pronoun denotes three persons, animals or things. It occurs in pronouns of several Austronesian languages, such as Biak.

Examples

Pauc: paucal number

A paucal noun denotes “a few” persons, animals or things.

Examples

Grpa: greater paucal number

A greater paucal noun denotes “more than several but not many” persons, animals or things. It occurs in Sursurunga, an Austronesian language.

Examples

Grpl: greater plural number

A greater plural noun denotes “many, all possible” persons, animals or things. Precise semantics varies across languages.

Examples

Inv: inverse number

Inverse number means non-default for that particular noun. (Some nouns are by default assumed to be singular, some dual or plural.) Occurs e.g. in Kiowa.

Examples

Count: count plural

A special plural form of nouns (and other parts of speech, such as adjectives) if they occur after numerals.

In Bulgarian and Macedonian, this form is known variously as “counting form”, “count plural” or “quantitative plural” (Sussex and Cubberley 2006, p. 324). (The form originates in the Proto-Slavic dual but it should not be marked Number=Dual because 1. the dual vanished from Bulgarian and 2. the form is no longer semantically tied to the number two.)

Other languages (e.g., Russian) have forms that are not necessarily related to dual, yet they are used exclusively with numerals.

Examples

Ptan: plurale tantum

Some nouns appear only in the plural form even though they denote one thing (semantic singular); some tagsets mark this distinction. Grammatically they behave like plurals, so Plur is obviously the back-off value here; however, if the language also marks gender, the non-existence of singular form sometimes means that the gender is unknown. In Czech, special type of numerals is used when counting nouns that are plurale tantum (NumType = Sets).

Examples

Coll: collective / mass / singulare tantum

Collective or mass or singulare tantum is a special case of singular. It applies to words that use grammatical singular to describe sets of objects, i.e. semantic plural. Although in theory they might be able to form plural, in practice it would be rarely semantically plausible. Sometimes, the plural form exists and means “several sorts of” or “several packages of”.

Examples

References

edit Number

Number[abs]: number agreement with absolutive argument

Number[abs]

Finite verbs in many Indo-European languages agree in person and number with their subject.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Number instead of Number[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection. Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time. Examples: dena (Number=Sing|Number[abs]=Sing), dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing), dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur), direnak (Number=Plur|Number[abs]=Plur). So we reserve the Number feature for nominal inflection, and the Number[abs] feature for agreement.

Note that we also define Person[abs] and Polite[abs], although there is no direct conflict for these features. But it is better to have these features aligned with Person[erg], Polite[erg], Person[dat] and Polite[dat].

Sing: singular absolutive argument

Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing

Plur: plural absolutive argument

Examples: [eu] dakarkiogu Number[erg]=Plur

edit Number[abs]

Number[dat]: number agreement with dative argument

Number[dat]

Finite verbs in many Indo-European languages agree in person and number with their subject.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Number instead of Number[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection. Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time. Examples: dena (Number=Sing|Number[abs]=Sing), dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing), dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur), direnak (Number=Plur|Number[abs]=Plur). So we reserve the Number feature for nominal inflection, and the Number[abs] feature for agreement.

Note that we also define Person[abs] and Polite[abs], although there is no direct conflict for these features. But it is better to have these features aligned with Person[erg], Polite[erg], Person[dat] and Polite[dat].

Sing: singular dative argument

Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing

Plur: plural dative argument

Examples: [eu] dakarkiogu Number[erg]=Plur

edit Number[dat]

Number[erg]: number agreement with ergative argument

Number[erg]

Finite verbs in many Indo-European languages agree in person and number with their subject.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Number instead of Number[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection. Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time. Examples: dena (Number=Sing|Number[abs]=Sing), dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing), dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur), direnak (Number=Plur|Number[abs]=Plur). So we reserve the Number feature for nominal inflection, and the Number[abs] feature for agreement.

Note that we also define Person[abs] and Polite[abs], although there is no direct conflict for these features. But it is better to have these features aligned with Person[erg], Polite[erg], Person[dat] and Polite[dat].

Sing: singular ergative argument

Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing

Plur: plural ergative argument

Examples: [eu] dakarkiogu Number[erg]=Plur

edit Number[erg]

Number[obj]: number agreement with object

Number[obj]

Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments, not just the subject. If the cross-reference involves the Number of the argument, we have two layers of Number on the verb: Number[subj], and (for transitive verbs) Number[obj].

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Number instead of Number[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection. Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time. Examples: dena (Number=Sing|Number[abs]=Sing), dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing), dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur), direnak (Number=Plur|Number[abs]=Plur). So we reserve the Number feature for nominal inflection, and the Number[abs] feature for agreement.

Note that we also define Person[abs] and Polite[abs], although there is no direct conflict for these features. But it is better to have these features aligned with Person[erg], Polite[erg], Person[dat] and Polite[dat].

Sing: singular object

Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing

Dual: dual object

Examples: [wbp] Nyanyi karnapalangu wawirrijarra. lit. see-NONPAST PRES-1SG(SUBJ)-3DU(OBJ) kangaroo-DU(ABS) “I see two kangaroos.”

Plur: plural object

Examples: [eu] dakarkiogu Number[erg]=Plur

edit Number[obj]

Number[psed]: possessed object’s number

Number[psed]

Number[psed] is the possessee’s (possessed, owned noun phrase’s) number. In Hungarian, possession can be marked on the possessor or on the possessed. It is possible, though rare, that a noun has three distinct number features: its own grammatical number, number of its possessor and number of its possession. Examples from the Multext-East Hungarian lexicon:

Words marked for plural possessions are very rare, though. Note that in the following example from Multext-East, Columbus is marked for plural possession, but not for his own owner.

Sing: singular possession

Examples

Plur: plural possession

Examples

edit Number[psed]

Number[psor]: possessor’s number

Possessives may have two different numbers: that of the possessed object (number agreement with modified noun) and that of the possessor. The Number[psor] feature captures the possessor’s number.

Sing: singular possessor

Examples

Dual: dual possessor

Examples

Plur: plural possessor

Examples

edit Number[psor]

Number[subj]: number agreement with subject

Number[subj]

Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments, not just the subject. If the cross-reference involves the Number of the argument, we have two layers of Number on the verb: Number[subj], and (for transitive verbs) Number[obj]. While it would be possible to make the subject layer the default and use just Number for it, the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Number instead of Number[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection. Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time. Examples: dena (Number=Sing|Number[abs]=Sing), dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing), dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur), direnak (Number=Plur|Number[abs]=Plur). So we reserve the Number feature for nominal inflection, and the Number[abs] feature for agreement.

Note that we also define Person[abs] and Polite[abs], although there is no direct conflict for these features. But it is better to have these features aligned with Person[erg], Polite[erg], Person[dat] and Polite[dat].

Sing: singular subject

Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing

Plur: plural subject

Examples: [eu] dakarkiogu Number[erg]=Plur

edit Number[subj]

PartType: particle type

Values: Emp Inf Int Mod Neg Vbp

Types of particles are found in various tagsets and are highly language-specific. The list here is not exhaustive. Language-specific documentation should provide a version of this page tailored to the given language.

Mod: modal particle

Examples: [bg] май (possibly), нека (let), [cs] ať, kéž, nechť (let)

Emp: particle of emphasis

Examples: [bg] даже (even)

Inf: infinitive marker

Examples: [en] to, [de] zu, [da] at, [sv] att

Int: question particle

Required in some languages to form a yes-no question.

Examples: [pl] czy

Neg: negation particle

Negates a clause or a smaller phrase.

Examples: [en] not, [de] nicht

Vbp: separated verb prefix in German

They are analogous to verbal particles in other Germanic languages, which again overlap with adpositions and adverbs. Do we want to tag them as adpositions/adverbs and add this feature?

Examples: [de] vor (in stellen Sie sich vor)

edit PartType

Person: person

Values: 0 1 2 3 4

Person is typically feature of personal and possessive pronouns / determiners, and of verbs. On verbs it is in fact an agreement feature that marks the person of the verb’s subject (some languages, e.g. Basque, can also mark person of objects). Person marked on verbs makes it unnecessary to always add a personal pronoun as subject and thus subjects are sometimes dropped (pro-drop languages).

0: zero person

Zero person is for impersonal statements, appears in Finnish as well as in Santa Ana Pueblo Keres. (The construction is distinctive in Finnish but it does not use unique morphology that would necessarily require a feature. However, it is morphologically distinct in Keres (Davis 1964:75): The fourth (zero) person is used “when the subject of the action is obscure, as when the speaker is telling of something that he himself did not observe. It is also used when the subject of the action is inferior to the object, as when an animal is the subject and a human being the object.”

Examples

1: first person

In singular, the first person refers just to the speaker / author. In plural, it must include the speaker and one or more additional persons. Some languages (e.g. Taiwanese) distinguish inclusive and exclusive 1st person plural pronouns: the former include the addressee of the utterance (i.e. I + you), the latter exclude them (i.e. I + they).

Examples

2: second person

In singular, the second person refers to the addressee of the utterance / text. In plural, it may mean several addressees and optionally some third persons too.

Examples

3: third person

The third person refers to one or more persons that are neither speakers nor addressees.

Examples

4: fourth person

The fourth person can be understood as a third person argument morphologically distinguished from another third person argument, e.g. in Navajo.

Examples

References

edit Person

Person[abs]: person agreement with the absolutive argument

Person[abs]

Finite verbs in many Indo-European languages agree in person and number with their subject.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Person instead of Person[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, we cannot avoid Number[abs] (both Number and Number[abs] can occur at one word) and thus we keep Person[abs] to demonstrate that it is the same layer of agreement for both the features.

1: first person absolutive argument

Examples: [eu] dakarkiogu Person[erg]=1

2: second person absolutive argument

Examples: [eu] dakarkiozu Person[erg]=2

3: third person absolutive argument

Examples: [eu] dakarkiogu Person[abs]=3|Person[dat]=3

edit Person[abs]

Person[dat]: person agreement with the dative argument

Person[dat]

Finite verbs in many Indo-European languages agree in person and number with their subject.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Person instead of Person[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, we cannot avoid Number[abs] (both Number and Number[abs] can occur at one word) and thus we keep Person[abs] to demonstrate that it is the same layer of agreement for both the features.

1: first person dative argument

Examples: [eu] dakarkiogu Person[erg]=1

2: second person dative argument

Examples: [eu] dakarkiozu Person[erg]=2

3: third person dative argument

Examples: [eu] dakarkiogu Person[abs]=3|Person[dat]=3

edit Person[dat]

Person[erg]: person agreement with the ergative argument

Person[erg]

Finite verbs in many Indo-European languages agree in person and number with their subject.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Person instead of Person[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, we cannot avoid Number[abs] (both Number and Number[abs] can occur at one word) and thus we keep Person[abs] to demonstrate that it is the same layer of agreement for both the features.

1: first person ergative argument

Examples

2: second person ergative argument

Examples

3: third person ergative argument

Examples

edit Person[erg]

Person[obj]: person agreement with object

Person[obj]

Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments, not just the subject. If the cross-reference involves the Person of the argument, we have two layers of Person on the verb: Person[subj], and (for transitive verbs) Person[obj].

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Person instead of Person[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, we cannot avoid Number[abs] (both Number and Number[abs] can occur at one word) and thus we keep Person[abs] to demonstrate that it is the same layer of agreement for both the features.

1: first person object

Examples: [eu] dakarkiogu Person[erg]=1

2: second person object

Examples: [eu] dakarkiozu Person[erg]=2

3: third person object

Examples: [eu] dakarkiogu Person[abs]=3|Person[dat]=3

edit Person[obj]

Person[psor]: possessor’s person

Person[psor] is possessor’s person, marked e.g. on Hungarian nouns. These noun forms would be translated to English as possessive pronoun + noun.

This layered feature is conveniently used for possessive inflections of nouns, although nouns normally do not have a Person feature, meaning that no other layers are needed. Nevertheless, the possessive morphology typically also includes Number, which must be multi-layered on nouns, and we thus have Person[psor] together with Number[psor].

This layered feature is normally not used with possessive pronouns. They traditionally have just simple Person. (And in some languages, possessive pronouns are actually identical to personal pronouns in the genitive case.)

1: first person possessor

Examples: [hu] kutya = dog; kutyám = my dog; kutyánk = our dog.

2: second person possessor

Examples: [hu] kutya = dog; kutyád = your.Sing dog; kutyátok = your.Plur dog.

3: third person possessor

Examples: [hu] kutya = dog; kutyája = his/her/its dog; kutyájuk = their dog.

edit Person[psor]

Person[subj]: person agreement with subject

Person[subj]

Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments, not just the subject. If the cross-reference involves the Person of the argument, we have two layers of Person on the verb: Person[subj], and (for transitive verbs) Person[obj]. While it would be possible to make the subject layer the default and use just Person for it, the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.

In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Person instead of Person[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, we cannot avoid Number[abs] (both Number and Number[abs] can occur at one word) and thus we keep Person[abs] to demonstrate that it is the same layer of agreement for both the features.

1: first person subject

Examples: [eu] dakarkiogu Person[erg]=1

2: second person subject

Examples: [eu] dakarkiozu Person[erg]=2

3: third person subject

Examples: [eu] dakarkiogu Person[abs]=3|Person[dat]=3

edit Person[subj]

Polarity: polarity

Values: Neg Pos

Polarity is typically a feature of verbs, adjectives, sometimes also adverbs and nouns in languages that negate using bound morphemes. In languages that negate using a function word, Polarity is used to mark that function word, unless it is a pro-form already marked with PronType=Neg (see below).

Positive polarity (affirmativeness) is rarely, if at all, encoded using overt morphology. The feature value Polarity=Pos is usually used to signal that a lemma has negative forms but this particular form is not negative. Using the feature in such cases is somewhat optional for words that can be negated but rarely are. Language-specific documentation should define under which circumstances the positive polarity is annotated.

In Czech, for instance, all verbs and adjectives can be negated using the prefix ne-.

In English, verbs are negated using the particle not. English adjectives can be negated with not, or sometimes using prefixes (wise – unwise, probable – improbable), although the use of prefixes is less productive than in Czech. In general, only the most grammatical (as opposed to lexical) forms of negation should receive Polarity=Neg.

Note that Polarity=Neg is not the same thing as PronType=Neg. For pronouns and other pronominal parts of speech there is no such binary opposition as for verbs and adjectives. (There is no such thing as “affirmative pronoun”.)

The Polarity feature can be also used to distinguish response interjections yes and no.

Pos: positive, affirmative

Examples

Neg: negative

Examples

edit Polarity

Polite: politeness

Values: Elev Form Humb Infm

Various languages have various means to express politeness or respect; some of the means are morphological. Three to four dimensions of politeness are distinguished in linguistic literature. The Polite feature currently covers (and mixes) two of them; a more elaborate system of feature values may be devised in future versions of UD if needed. The two axes covered are:

Changing pronouns and/or person and/or number of the verb forms when respectable persons are addressed in Indo-European languages belongs to the speaker-referent axis because the honorific pronouns are used to refer to the addressee.

In Czech, formal second person has the same form for singular and plural, and is identical to informal second person plural. This involves both the pronoun and the finite verb but not a participle, which has no special formal form (that is, formal singular is identical to informal singular, not to informal plural).

In German, Spanish or Hindi, both number and person are changed (informal third person is used as formal second person) and in addition, special pronouns are used that only occur in the formal register ([de] Sie; [es] usted, ustedes; [hi] आप āpa).

In Japanese, verbs and other words have polite and informal forms but the polite forms are not referring to the addressee (they are not in second person). They are just used because of who the addressee is, even if the topic does not involve the addressee at all. This kind of polite language is called teineigo (丁寧語) and belongs to the speaker-addressee axis. Nevertheless, we currently use the same values for both axes, i.e. Polite=Form can be used for teineigo too. This approach may be refined in future.

Infm: informal register

Usage varies but if the language distinguishes levels of politeness, then the informal register is usually meant for communication with family members and close friends.

Examples

Form: formal register

Usage varies but if the language distinguishes levels of politeness, then the polite register is usually meant for communication with strangers and people of higher social status than the one of the speaker.

Examples

Elev: referent elevating

This register belongs to the speaker-referent axis and can be seen as a subtype of the formal register there. As an example, Japanese sonkeigo (尊敬語) is a set of honorific forms that elevate the status of the referent.

Examples

Humb: speaker humbling

This register belongs to the speaker-referent axis and can be seen as a subtype of the formal register there. As an example, Japanese kenjōgo (謙譲語) is a set of honorific forms that lower the speaker’s status, thereby raising the referent’s status by comparison.

Examples

References

edit Polite

Polite[abs]: politeness agreement with absolutive argument

Polite[abs]

Finite verbs in many Indo-European languages agree in person and number with their subject; for the second person this also affects the politeness register. In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Polite instead of Polite[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, we cannot avoid Number[abs] (both Number and Number[abs] can occur at one word) and thus we keep Polite[abs] to demonstrate that it is the same layer of agreement for both the features.

Infm: informal absolutive argument

Examples: [eu] ezan, ezak Polite[erg]=Inf

Form: polite, formal absolutive argument

Examples: [eu] ezazu Polite[erg]=Pol (politeness-neutral form is ezazue)

edit Polite[abs]

Polite[dat]: politeness agreement with dative argument

Polite[dat]

Finite verbs in many Indo-European languages agree in person and number with their subject; for the second person this also affects the politeness register. In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Polite instead of Polite[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, we cannot avoid Number[abs] (both Number and Number[abs] can occur at one word) and thus we keep Polite[abs] to demonstrate that it is the same layer of agreement for both the features.

Infm: informal dative argument

Examples: [eu] ezan, ezak Polite[erg]=Inf

Form: polite, formal dative argument

Examples: [eu] ezazu Polite[erg]=Pol (politeness-neutral form is ezazue)

edit Polite[dat]

Polite[erg]: politeness agreement with ergative argument

Polite[erg]

Finite verbs in many Indo-European languages agree in person and number with their subject; for the second person this also affects the politeness register. In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).

One may want to use just Polite instead of Polite[abs]. However, there are two issues with that (at least in Basque). First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway. Second, we cannot avoid Number[abs] (both Number and Number[abs] can occur at one word) and thus we keep Polite[abs] to demonstrate that it is the same layer of agreement for both the features.

Infm: informal ergative argument

Examples: [eu] ezan, ezak Polite[erg]=Inf

Form: polite, formal ergative argument

Examples: [eu] ezazu Polite[erg]=Pol (politeness-neutral form is ezazue)

edit Polite[erg]

Poss: possessive

Values: Yes

Boolean feature of pronouns, determiners or adjectives. It tells whether the word is possessive.

While many tagsets would have “possessive” as one of the various pronoun types, this feature is intentionally separate from PronType, as it is orthogonal to pronominal types. Several of the pronominal types can be optionally possessive, and adjectives can too.

Yes: it is possessive

Note that there is no No value. If the word is not possessive, the Poss feature will just not be mentioned in the FEAT column. (Which means that empty value has the No meaning.)

Examples

edit Poss

PrepCase: case form sensitive to prepositions

Personal pronouns in some languages have different forms depending on whether they are objects of prepositions or not. For instance, Czech on (he) without prepositions has the forms jemu/DAT, jeho/ACC, jím/INS, while with a preposition it is němu/DAT, něho/ACC, ním/INS. Similarly, Portuguese pronouns in prepositional oblique case take forms different from oblique pronouns serving as direct objects of verbs: eu/NOM (I), me/ACC (give me that), mim/PREP-ACC (come to me).

Default empty value means that the word form is neutral w.r.t. prepositions.

Npr: non-prepositional case

This word form must not be used after a preposition.

Examples: [cs] jemu “him” (dative)

Pre: prepositional case

This word form must be used after a preposition.

Examples: [cs] k němu “to him” (dative)

edit PrepCase

PronType: pronominal type

Values: Art Dem Emp Exc Ind Int Neg Prs Rcp Rel Tot

This feature typically applies to pronouns, pronominal adjectives (determiners), pronominal numerals (quantifiers) and pronominal adverbs.

Prs: personal or possessive personal pronoun or determiner

See also the Poss feature that distinguishes normal personal pronouns from possessives. Note that Prs also includes reflexive personal/possessive pronouns (e.g. [cs] se / svůj; see the Reflex feature).

Examples

Rcp: reciprocal pronoun

This value is used for pronouns that are specifically reciprocal. If a reflexive pronoun can be used to convey reciprocal meaning, it is still labeled as reflexive (PronType=Prs|Reflex=Yes). It is not marked as reciprocal in contexts in which it is used reciprocally.

Reciprocal means that there is a plural subject and every member of the group does the thing described by the predicate to every other member of the group. A reciprocal pronoun is used in the object position to signal such configuration.

Examples

Art: article

Article is a special case of determiner that bears the feature of definiteness (in other languages, the feature may be marked directly on nouns).

Examples

Int: interrogative pronoun, determiner, numeral or adverb

Note that possessive interrogative determiners (whose) can be distinguished by the Poss feature.

Examples:

Rel: relative pronoun, determiner, numeral or adverb

Note that in many languages this class heavily overlaps with interrogatives, yet there are pronouns that are only relative, and in some languages (Bulgarian, Hindi) the two classes are distinct.

Examples:

Exc: exclamative determiner

Exclamative pro-adjectives (determiners) express the speaker’s surprise towards the modified noun, e.g. what in “What a surprise!” In many languages, exclamative determiners are recruited from the set of interrogative determiners. Therefore, not all tagsets distinguish them.

Examples:

Dem: demonstrative pronoun, determiner, numeral or adverb

These are often parallel to interrogatives. Some tagsets might also distinguish a separate feature of distance (here / there; [es] aquí / ahí / allí).

Examples

Emp: emphatic determiner

Emphatic pro-adjectives (determiners) emphasize the nominal they depend on. There are similarities with reflexive and demonstrative pronouns / determiners.

Examples

Tot: total (collective) pronoun, determiner or adverb

Examples

Neg: negative pronoun, determiner or adverb

Negative pronominal words are distinguished from negating particles and from words that inflect for polarity (verbs, adjectives etc.) Those words do not use PronType=Neg, they use Polarity=Neg instead. See the Polarity feature for further details.

Examples:

Ind: indefinite pronoun, determiner, numeral or adverb

Note that some tagsets might further subclassify this category to distinguish “some” from “any” etc. Such distinctions are not part of universal features but may be added in language-specific extensions.

Examples

edit PronType

PunctSide: which side of paired punctuation is this?

Distinguishes between initial and final form of pairwise punctuation (brackets, quotation marks, question and exclamation in Spanish). Note that “initial” and “final” are better terms than “left” and “right”. The latter would be confusing in languages writing from right to left, like Arabic.

Ini: initial (left bracket in English texts)

Examples

Fin: final (right bracket in English texts)

Examples

edit PunctSide

PunctType: punctuation type

Values: Brck Colo Comm Dash Elip Excl Peri Qest Quot Semi Slsh

Many tagsets have just one tag for punctuation. Others classify punctuation in more detail.

Peri: period at the end of sentence or clause

Examples

Elip: ellipsis

Examples

Qest: question mark

Examples

Excl: exclamation mark

Examples

Quot: quotation marks (various sorts in various languages)

Examples

Brck: bracket

Examples

Comm: comma

Examples

Colo: colon

Examples

Semi: semicolon

Examples

Dash: dash, hyphen

Examples

Slsh: slash or backslash

Examples

edit PunctType

Reflex: reflexive

Values: Yes

Boolean feature, typically of pronouns or determiners. It tells whether the word is reflexive, i.e. refers to the subject of its clause.

While many tagsets would have “reflexive” as one of the various pronoun types, this feature is intentionally separate from PronType. When used with pronouns and determiners, it should be combined with PronType=Prs, regardless whether they really distinguish the Person feature (in some languages they do, in others they do not).

Note that forms that are canonically reflexive sometimes have other functions in the language, too. The feature Reflex=Yes denotes the word type, not its actual function in context (which can be distinguished by dependency relation types). Hence the feature is not restricted to situations where the word is used truly reflexively.

For example, reflexive clitics in European languages often have a wide array of possible functions (middle, passive, inchoative, impersonal, or even as a lexical morpheme). Besides that, reflexives in some languages are also used for emphasis (while other languages have separate emphatic pronouns), and in some languages they signal reciprocity (while other languages have separate reciprocal pronouns). Using Reflex=Yes with all of them has the benefit that they can be easily identified (however, if it is possible for the annotators to distinguish contexts where a reflexive pronoun is used reciprocally or emphatically, it is possible to combine Reflex=Yes with PronType=Rcp or PronType=Emp, instead of PronType=Prs).

Note that while some languages also have reflexive verbs, these are in fact fused verbs with reflexive pronouns, as in Spanish despertarse or Russian проснуться (both meaning “to wake up”). Thus in these cases the fused token will be split to two syntactic words, one of them being a reflexive pronoun. In languages where the reflexive pronoun is not split, it may be more appropriate to mark the verb as the middle Voice than using Reflex=Yes with the verb.

Yes: it is reflexive

Note that there is no No value. If the word is not reflexive, the Reflex feature will just not be mentioned in the FEAT column. (Which means that empty value has the No meaning.)

Examples

edit Reflex

Style: style or sublanguage to which this word form belongs

Values: Arch Coll Expr Form Rare Slng Vrnc Vulg

This may be a lexical feature (some words-lemmas are archaic, some are colloquial) or a morphological feature (inflectional patterns may systematically change between dialects or styles). English pronouns offer a useful case study: thou is archaic; whom is often somewhat formal; ya is colloquial, used in a casual/familiar way (See ya!); y’all is vernacular (especially associated with certain regions); and wtf is arguably an expressive variant of the pronoun what in contexts where a nominal is required (Wtf are you doing?!).

Besides real morphology, the choices that make a particular word form belong to a different style may also be orthographic.

This feature could be used in many languages but only a few choose to actually annotate it. Seen in Bulgarian, Czech, Danish, English, Finnish and Hungarian.

Arch: archaic, obsolete

This value should be used if it is desirable in a language to mark archaic lexemes or archaic morphological forms. Language-specific guidelines must define what exactly it means to be archaic. Note that there are theoretical problems, especially if we want to annotate diachronic corpora with various stages of the language. There is only one set of guidelines per language, which should accommodate all stages and genres. It would be unfortunate if most words in older texts had to be labeled as Style=Arch. Hence, the only useful application of the feature is probably for words that were already archaic at the time of production of the text.

Examples

Rare: rare

Examples

Form: formal, literary

Examples

Coll: colloquial

Examples

Vrnc: vernacular

Examples

Slng: slang

Examples

Expr: expressive, emotional

This indicates a distinctive morphological or spelling choice for added expressiveness (with respect to pronunciation or meaning).

In the case of an expressive spelling variant, this feature should be paired with a CorrectForm in the MISC column, as explained in the page on typos. Compare the Typo feature, which covers errors and typographical unexpectedness.

Examples

Vulg: vulgar

Examples

edit Style

Subcat: subcategorization

Values: Ditr Indir Intr Tran

Lexical feature of verbs. Some tagsets distinguish intransitive and transitive verbs. In many languages however, subcategorization of verbs is much more complex than this.

Intr: intransitive verb

A verb that does not take arguments other than the subject.

Examples

Indir: indirect verb

A verb that does not require a direct object but it requires an oblique argument.

Examples

Tran: transitive verb

A verb that takes a direct (accusative) object as argument (in addition to the subject). These verbs can be passivized, then the direct object becomes the subject.

Examples

Ditr: ditransitive verb

A verb that takes two core objects as arguments (in addition to the subject). These verbs can be passivized.

Examples

edit Subcat

Tense: tense

Values: Fut Imp Past Pqp Pres

Tense is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as participles are classified as verbs or as the other category.

Tense is a feature that specifies the time when the action took / takes / will take place, in relation to a reference point. The reference is often the moment of producing the sentence, but it can be also another event in the context. In some languages (e.g. English), some tenses are actually combinations of tense and aspect. In other languages (e.g. Czech), aspect and tense are separate, although not completely independent of each other.

Note that we are defining features that apply to a single word. If a tense is constructed periphrastically (two or more words, e.g. auxiliary verb indicative + participle of the main verb) and none of the participating words are specific to this tense, then the features will probably not directly reveal the tense. For instance, [en] I had been there is past perfect (pluperfect) tense, formed periphrastically by the simple past tense of the auxiliary to have and the past participle of the main verb to be. The auxiliary will be tagged VerbForm=Fin|Mood=Ind|Tense=Past and the participle will have VerbForm=Part|Tense=Past; none of the two will have Tense=Pqp. On the other hand, Portuguese can form the pluperfect morphologically as just one word, such as estivera, which will thus be tagged VerbForm=Fin|Mood=Ind|Tense=Pqp.

Past: past tense / preterite / aorist

The past tense denotes actions that happened before a reference point. In the prototypical case, the reference point is the moment of producing the sentence and the past event happened before the speaker speaks about it. However, Tense=Past is also used to distinguish past participles from other kinds of participles, and past converbs from other kinds of converbs; in these cases, the reference point may itself be in past or future, when compared to the moment of speaking. For instance, the Czech converb spatřivše “having seen” in the sentence spatřivše vojáky, velmi se ulekli “having seen the soldiers, they got very scared” describes an event that is anterior to the event of getting scared. It also happens to be anterior to the moment of speaking, but that fact is not encoded in the converb itself, it is rather a consequence of “getting scared” being in the past tense.

Among finite forms, the simple past in English is an example of Tense=Past. In German, this is the Präteritum. In Turkish, this is the non-narrative past. In Bulgarian, this is aorist, the aspect-neutral past tense that can be used freely with both imperfective and perfective verbs (see also imperfect).

Examples

Pres: present / non-past tense / aorist

The present tense denotes actions that are in progress (or states that are valid) in a reference point; it may also describe events that usually happen. In the prototypical case, the reference point is the moment of producing the sentence; however, Tense=Pres is also used to distinguish present participles from other kinds of participles, and present converbs from other kinds of converbs. In these cases, the reference point may be in past or future when compared to the moment of speaking. For instance, the English present participle may be used to form a past progressive tense: he was watching TV when I arrived.

Some languages (e.g. Uralic) only distinguish past vs. non-past morphologically, and then Tense=Pres can be used to represent the non-past form. (In some grammar descriptions, e.g. Turkic or Mongolic, this non-past form may be termed aorist, but note that in other languages the term is actually used for a past tense, as noted above. Therefore the term is better avoided in UD annotation.) Similarly, some Slavic languages (e.g. Czech), although they do distinguish the future tense, nevertheless have a subset of verbs where the morphologically present form has actually a future meaning.

Examples

Fut: future tense

The future tense denotes actions that will happen after a reference point; in the prototypical case, the reference point is the moment of producing the sentence.

Examples

Imp: imperfect

Used in e.g. Bulgarian and Croatian, imperfect is a special case of the past tense. Note that, unfortunately, imperfect tense is not always the same as past tense + imperfective aspect. For instance, in Bulgarian, there is lexical aspect, inherent in verb meaning, and grammatical aspect, which does not necessarily always match the lexical one. In main clauses, imperfective verbs can have imperfect tense and perfective verbs have perfect tense. However, both rules can be violated in embedded clauses.

Examples

Pqp: pluperfect

The pluperfect denotes action that happened before another action in past. This value does not apply to English where the pluperfect (past perfect) is constructed analytically. It applies e.g. to Portuguese.

Examples

edit Tense

Typo: is this a misspelled word?

Values: Yes

Indicates an erroneous or typographically unexpected word form.

Most unexpected spellings are typographical errors (inadvertent on the part of the author). Also unexpected: creatively using special characters or spaces for visual effect; or unusual character encoding. For transcribed speech, no distinction is made between the original speaker and the transcriber, so a mispronunciation like shilly for silly is also treated like a typo. This feature can also encompass clear errors in word choice, such as learner errors and dysfluencies (e.g. lesser where fewer is appropriate, or eats instead of eat).

Note that “typographically unexpected” is interpreted in the context of the genre. Abbreviations or popular informal spellings are not necessarily unexpected. See Abbr.

Superfluous word-internal spaces are addressed using the goeswith relation to connect parts of the word. Typo=Yes should be used with the goeswith head (and this is enforced by validation for treebanks that use features).

The correct spelling can be indicated in the MISC column with the CorrectForm feature, as discussed in the page on typos.

Capitalization, etc.: Cases where an unexpected form of a letter is used within a word—e.g., unexpected capitalization choices—should be handled on a language- and treebank-specific basis. In a social media treebank, for example, it may not be practical to flag all nonstandard capitalization choices as Typo=Yes given the wide variability of capitalization in unedited writing.

Stylistic choices: Typo=Yes is intended for specifically orthographic unexpectedness, not unexpected word variants in general. If the author is taken to be signaling an intentionally modified pronunciation of a word, inventing a new word, or making a pun, that is not Typo if the unexpectedness is reflected phonologically. The optional Style feature may be useful in such cases. Deliberate, well-established conventions of altering the written forms of words, e.g. censoring profanity with nonalphabetic symbols, should also be considered expressive stylistic choices rather than typographical unexpectedness.

Extra words: For extra or missing words, see the policy on errors. A valid word that is superfluous in the sentence and attached as reparandum does not receive Typo=Yes.

Yes: it is typo

Examples

edit Typo

VerbForm: form of verb or deverbative

Values: Conv Fin Gdv Ger Inf Part Sup Vnoun

Even though the name of the feature seems to suggest that it is used exclusively with verbs, it is not the case. Some verb forms in some languages actually form a gray zone between verbs and other parts of speech (nouns, adjectives and adverbs). For instance, participles may be either classified as verbs or as adjectives, depending on language and context. In both cases VerbForm=Part may be used to separate them from other verb forms or other types of adjectives.

Fin: finite verb

Rule of thumb: if it has non-empty Mood, it is finite. But beware that some tagsets conflate verb forms and moods into one feature.

Examples

Inf: infinitive

Infinitive is the citation form of verbs in many languages. Unlike in English, it often has morphological form that is distinct from the finite forms. Infinitives may be used together with auxiliaries to form periphrastic tenses (e.g. future tense [cs] budu sedět v letadle “I will sit in a plane”), they appear as arguments of modal verbs etc. In some languages, e.g. in Hindi, they behave similarly to nouns and are used as such (similar to the gerund in English). Nevertheless, this observation is not universal and, e.g. in Slavic languages, infinitives are quite distinct from verbal nouns.

Examples

Sup: supine

Supine is a rare verb form. It survives in some Slavic languages (Slovenian) and is used instead of infinitive as the argument of motion verbs (old [cs] jdu spat lit. I-go sleep).

A form called “supine” also exists in Swedish where it is a special form of the participle, used to form the composite past form of a verb. It is used after the auxiliary verb ha (to have) but not after vara (to be):

Examples

Part: participle, verbal adjective

Participle is a non-finite verb form that shares properties of verbs and adjectives. Its usage varies across languages. It may be used to form various periphrastic verb forms such as complex tenses and passives; it may be also used purely adjectively.

Other features may help to distinguish past/present participles (English), active/passive participles (Czech), imperfect/perfect participles (Hindi) etc.

Examples

Conv: converb, transgressive, adverbial participle, verbal adverb

The converb, also called adverbial participle or transgressive, is a non-finite verb form that shares properties of verbs and adverbs. It appears e.g. in Slavic and Indo-Aryan languages.

Note that this value was called Trans in UD v1 and it has been renamed Conv in UD v2.

Examples

Gdv: gerundive

Used in Latin and Ancient Greek. Not to confuse with gerund.

Examples

Ger: gerund

Using VerbForm=Ger is discouraged and alternatives should be considered first because the term gerund is rather confusing: the English gerund is a verbal noun or a converb, and it shares the morphological form with present participle (which may mean that the tagset will not distinguish it from the participle); the gerundio in Spanish and other Romance languages shows some similarities with present participles and with converbs, but not with verbal nouns; likewise, some Slavists use the English term gerund to denote converbs (adverbial participles), which should be labeled VerbForm=Conv; and UD version 1 recommended (inspired by English) to use it for verbal nouns, which in UD v2 should use VerbForm=Vnoun.

However, the feature is still available in UDv2 and can be used if the alternatives do not seem acceptable. The feature may be removed in future versions but comprehensive investigation has to be done first.

Examples

Vnoun: verbal noun, masdar

Verbal nouns other than infinitives. Also called masdars by some authors, e.g. Haspelmath, 1995.

While in some languages verbal noun and infinitive may be two labels for the same category (and then the language-specific documentation must specify which label should be used), in other languages these categories are distinct. For example, most Slavic languages have infinitive as a specific, uninflected form of the verb, and they also have derived verbal nouns, which behave much like ordinary nouns, have a noun-like distribution (different from infinitives), and inflect for case and number.

Examples

References

edit VerbForm

VerbType: verb type

Values: Aux Cop Mod Light Quasi

We already split auxiliary and non-auxiliary verbs at the level of UPOS tags. The VerbType feature may be used to capture finer distinctions that some tagsets make.

Aux: auxiliary verb

Verb used to create periphrastic verb forms (tenses, passives etc.) In many languages there will be ambiguity between auxiliary and other usages, thus the same verb should get different feature values depending on context.

Examples

Cop: copula verb

Verb used to make nominal predicates from adjectives, nouns or participles. Some languages omit the copula or use other means to create nominal predicates. In languages that have copula, it is often the equivalent of the verb “to be”.

Examples

Mod: modal verb

A group of verbs traditionally distinguished in grammars of some languages. They take infinitive of another verb as argument (with or without infinitive-marking conjunction, in languages that have it) and add various modes of possibility, necessity etc. to the meaning of the infinitive. There are other verbs that take infinitives as arguments but they are not considered modal (e.g. phasal verbs such as “to begin to do something”). The set of modal verbs for a language is closed and can be enumerated. Depending on language-internal considerations, modal verbs may be considered a subset of auxiliaries (AUX) or non-auxiliary verbs (VERB).

Note that some languages (e.g. Turkish) use special forms of the main verb instead of combining it with a modal verb.

Examples

Light: light (support) verb

Light or support verb is used in verbo-nominal constructions where the main part of the meaning is contributed by a noun complement but it is not just a nominal predicate with a copula. An English example would be to take a nap, where take is the light verb. It is often the case that the light verb can also function as a normal verb in the language (cf. to take two dollars). If the light verb constructions are used frequently in a language (e.g. Hindi or Japanese) or if there is a dedicated light verb that cannot be used as normal verb, it makes sense to mark light verbs with a dedicated feature value.

Examples

Quasi: quasi-verb

A word that functions partially as a verb and is tagged VERB, yet it is defective in some other aspect that are typical of verbs in the given language. For example, quasi-verbs in Polish function as predicates and take infinitives of regular verbs as complements, yet their morphology is not verbal: they are more like frozen forms of adjectives.

Examples

edit VerbType

Voice: voice

Values: Act Antip Bfoc Cau Dir Inv Lfoc Mid Pass Rcp

Voice is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.

For Indo-European speakers, voice means mainly the active-passive distinction. In other languages, other shades of verb meaning are categorized as voice.

Act: active or actor-focus voice

The subject of the verb is the doer of the action (agent), the object is affected by the action (patient). This label is also used for the actor-focus voice of Austronesian languages.

Examples

Mid: middle voice

Between active and passive, needed e.g. in Ancient Greek or Sanskrit. The subject is both doer and undergoer in a sense: he is acting upon himself.

Examples

(source)

Rcp: reciprocal voice

In a plural subject, all members are doers and undergoers, acting upon each other.

Examples

Pass: passive or patient-focus voice

The subject of the verb is affected by the action (patient). The doer (agent) is either unexpressed or it appears as an oblique dependent or an object of the verb. This label is also used for the patient-focus voice of Austronesian languages.

Examples

Antip: antipassive voice

In ergative-absolutive languages, the absolutive P argument is demoted to an oblique dependent and the ergative A argument takes the absolutive form, thus transforming a transitive clause into intransitive.

Examples

Lfoc: location-focus voice

The subject of the verb indicates location or direction, while the doer and the undergoer/theme are coded as objects.

Examples

Bfoc: beneficiary-focus voice

The subject of the verb indicates the beneficiary, while the doer and the undergoer/theme are coded as objects.

Examples

Dir: direct voice

Used in direct-inverse voice systems, e.g. in Algonquian languages of North America. Direct means that the argument that is higher in salience hierarchy is the subject. Example hierarchy: human 1st person – 2nd – 3rd – non-human animate – inanimate.

Examples

Inv: inverse voice

Used in direct-inverse voice systems, e.g. in Algonquian languages of North America. Inverse voice marking means that the argument lower in the hierarchy functions as subject.

Examples

Cau: causative voice

Causative forms of verbs are classified as a voice category because, when compared to the basic active form, they change the number of participants and their mapping on semantic roles. (See, e.g., the documentation of the METU Sabanci treebank (page 26).) Note that this is a feature of verbs. There are languages that have also the causative case of nouns.

Examples

edit Voice