home issue tracker

This page still pertains to UD version 1.

Features

Lexical features
PronType
NumType
Poss
Reflex
Inflectional features
Nominal Verbal
Gender VerbForm
Animacy Mood
Number Tense
Case Aspect
Definite Voice
Degree Person
Negative

Animacy: animacy

Similarly to Gender (and to the African noun classes), animacy is usually a lexical feature of nouns and inflectional feature of other parts of speech that mark agreement with nouns. It is independent of gender, therefore it is encoded separately in some tagsets (e.g. all the Multext-East tagsets).

In the BulTreeBank tagset Animacy is not encoded as a special feature. The dichotomy that plays a role here is rather: Human - Non-human. With very few exceptions, these features are not encoded grammatically.

Anim: animate

As explicitly Animate can be considered the following pronouns:

Nhum: animate but non-human

It has the so-called count form in contrast to the humans, but only for masculine nouns. The count form is a kind of plural, which comes after numerals.

Inan: inanimate

It has also the so-called count form in contrast to the humans, but only for masculine nouns. The count form is a kind of plural, which comes after numerals.

Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.

edit Animacy

Aspect: aspect

Aspect

Aspect is a feature that specifies duration of the action in time, whether the action has been completed etc.

In Bulgarian aspect is a lexical feature, as in other Slavic languages. It comprises two grammemes: perfective and imperfective.

Imp: imperfect aspect

The action took / takes / will take some time span and there is no information whether and when it was / will be completed.

Examples

Perf: perfect aspect

The action has been / will have been completed. Since there is emphasis on one point on the time scale (the point of completion), this aspect does not work well with the present tense for actual activities.

Examples

edit Aspect

Case: case

In Bulgarian only some nouns have special vocative forms (v):

Examples

The cases are still alive in personal pronouns: nominative (n), accusative (a) and dative (d).

Examples

Accusative and dative cases are still present in the masculine, singular forms of some other pronouns – interrogative, indefinite, collective, relative, negative. Please note that the dative forms are analytical and thus, only the accusative form is marked after the preposition ‘на’.

Examples

In our tagset another idiosyncratic case has been marked – the so-called ‘dative possessive case’ (s). It refers to situations where the short possessive pronoun comes before its possessor noun and thus – next to the verb.

Examples

The canonical sentence would be: Той взе шапката ми / Toy vze shapkata mi ‘He took hat.DEF my.POSS’ (He took my hat).

edit Case

Definite: definiteness or state

Definiteness is typically a feature of nouns, adjectives and articles. Its value distinguishes whether we are talking about something known and concrete, or something general or unknown. It can be marked on definite and indefinite articles, or directly on nouns, adjectives etc.

In Bulgarian there are definite and indefinite articles. The definite article is part of the word, in postposition (жената / zhenata ‘woman-the’ (the woman))). The indefinite articles can be: the form един / edin (one) or the zero marker.

However, when added to a nominal phrase, the articles become phrasal affixes, i.e. Bulgarian does not have agreement is definiteness. For example, хубавата висока руса жена / hubavata visoka rusa zhena ‘pretty-the tall blond woman’ (the pretty tall blond woman).

Ind: indefinite

Examples

Def: definite

Examples

edit Definite

Degree: degree of comparison

Degree of comparison is typically an inflectional feature of some adjectives and adverbs.

In Bulgarian the comparative and superlative forms are created with the help of the particles по / po “more” and най / nay “most”, which are part of the word and come in preposition, separated by a defice.

Pos: positive, first degree

This is the base form that merely states a quality of something, without comparing it to qualities of others. Note that although this degree is traditionally called “positive”, negative properties can be compared, too.

Examples

Cmp: comparative, second degree

The quality of one object is compared to the same quality of another object.

Examples

Sup: superlative, third degree

The quality of one object is compared to the same quality of all other objects within a set.

Examples

edit Degree

Gender: gender

Gender is usually a lexical feature of nouns and inflectional feature of other parts of speech (adjectives, verbs) that mark agreement with nouns. In Bulgarian gender is grammatical.

There are three genders: masculine(m), feminine (f) and neuter (n).

Masc: masculine gender

Nouns denoting male persons are masculine. Other nouns may be also grammatically masculine, without any relation to sex.

Example: [bg] замък / zamak “castle”

Fem: feminine gender

Nouns denoting female persons are feminine. Other nouns may be also grammatically feminine, without any relation to sex.

Example: [bg] маса / masa “table”

Neut: neuter gender

Neither masculine nor feminine (grammatically).

Example: [bg] дете / dete “child”

edit Gender

Mood: mood

Mood

Mood is a feature that expresses modality and subclassifies finite verb forms. In Bulgarian there are three moods: Indicative, Imperative and Conditional.

Ind: indicative

The indicative can be considered the default mood. A verb in indicative merely states that something happens, has happened or will happen, without adding any attitude of the speaker. Indicative covers all the 9 tenses and their passive forms in Bulgarian. It also covers the evidential forms.

Examples

Imp: imperative

The speaker uses imperative to order or ask the addressee to do the action of the verb. The forms in Bulgarian are synthetic.

Examples

Cnd: conditional

The conditional mood is used to express actions that might happen under certain circumstances or that would have taken place but they actually did not / do not happen. It usually presupposes volition. The forms in Bulgarian are analytic.

Examples

edit Mood

NumType: numeral type

NumType

Some languages (especially Slavic) have a complex system of numerals. For example, in the school grammar of Czech, the main part of speech is “numeral”, it includes almost everything where counting is involved and there are various subtypes. It also includes interrogative, relative, indefinite and demonstrative words referring to numbers (words like kolik / how many, tolik / so many, několik / some, a few), so at the same time we may have a non-empty value of PronType. (In English, these words are called quantifiers and they are considered a subgroup of determiners.)

In this respect Bulgarian behaves like Czech language.

From the syntactic point of view, some numtypes behave like adjectives and some behave like adverbs. We tag them u-pos/ADJ and u-pos/ADV respectively. Thus the NumType feature applies to several different parts of speech:

Card: cardinal number or corresponding interrogative / relative / indefinite / demonstrative word

Note that in some Indo-European languages there is a fuzzy borderline between numerals and nouns for thousand, million and billion.

Examples

Ord: ordinal number or corresponding interrogative / relative / indefinite / demonstrative word

This is a subtype of adjective.

Examples

Mult: multiplicative numeral or corresponding interrogative / relative / indefinite / demonstrative word

This is subtype of adverb.

Examples

Frac: fraction

This is a subtype of cardinal numbers, occasionally distinguished in corpora. It may denote a fraction or just the denominator of the fraction. In Bulgarian the numerator is cardinal numeral and denominator is ordinal numeral.

Examples

edit NumType

Number: number

Number is an inflectional feature of nouns, adjectives, verbs. In the tagset it is encoded as: singular (s), plural (p), count (c), pluralia tantum (l). Singularia tantum is not encoded.

Sing: singular number

A singular noun denotes one person, animal or thing.

Examples: [bg] молив / moliv (pencil)

Plur: plural number

A plural noun denotes several persons, animals or things.

Examples: [bg] моливи / molivi (pencils)

Count: count plural form

A form that is used as plural for masculine non-person nouns after numerals. This is a remnant of the dual form.

Examples: [bg] 2 молива / (2) moliva (2 pencils-count)

Ptan: plurale tantum

Some nouns appear only in the plural form even though they denote one thing (semantic singular); some tagsets mark this distinction.

Examples: [bg] финанси, дънки / finansi, danki (finances, jeans)

Coll: collective / mass / singulare tantum

Collective or mass or singulare tantum is a special case of singular. It applies to words that use grammatical singular to describe sets of objects, i.e. semantic plural.

Examples: [bg] човечество / chovechestvo (mankind)

edit Number

Person: person

Person

Person is typically feature of personal and possessive pronouns, and of verbs. On verbs it is in fact an agreement feature that marks the person of the verb’s subject. Person marked on verbs makes it unnecessary to always add a personal pronoun as subject and thus subjects are sometimes dropped (pro-drop languages).

Bulgarian is a pro-drop language, as other Slavic languages.

1: first person

In singular, the first person refers just to the speaker / author. In plural, it must include the speaker and one or more additional persons.

Examples

2: second person

In singular, the second person refers to the addressee of the utterance / text. In plural, it may mean several addressees and optionally some third persons too.

Examples

3: third person

The third person refers to one or more persons that are neither speakers nor addressees.

Examples

edit Person

Polarity: whether the word can be or is negated

Polarity

Polarity is typically a feature of verbs, adjectives, sometimes also adverbs and nouns in languages that negate using bound morphemes.

In Bulgarian nouns, adjectives, attrubutive participles use bound morpheme не (with the exception of clear contrastive contexts) Verbs and transgressives, however, use the clitic не for negation.

The negativeness feature is used to distinguish response interjections yes and no.

Pos: positive, affirmative

Examples

Neg: negative

Examples

edit Polarity

Poss: possessive

Poss

Boolean feature of pronouns, determiners or adjectives. It tells whether the word is possessive.

While many tagsets would have “possessive” as one of the various pronoun types, this feature is intentionally separate from PronType, as it is orthogonal to pronominal types. Several of the pronominal types can be optionally possessive, and adjectives can too.

In BulTreeBank tagset “possessive” is one of the various pronoun types.

Yes: it is possessive

Note that there is no No value. If the word is not possessive, the Poss feature will just not be mentioned in the FEAT column. (Which means that empty value has the No meaning.)

Examples

edit Poss

PronType: pronominal type

PronType

This feature typically applies to pronouns, determiners, pronominal numerals (quantifiers) and pronominal adverbs.

Prs: personal or possessive personal pronoun or determiner

See also the Poss feature that distinguishes normal personal pronouns from possessives. Note that Prs also includes reflexive personal/possessive pronouns (e.g. [cs] se / svůj; see the Reflex feature).

Examples

Rcp: reciprocal pronoun

Examples

Int: interrogative pronoun, determiner, numeral or adverb

Note that possessive interrogative determiners (whose) can be distinguished by the Poss feature.

Examples:

Rel: relative pronoun, determiner, numeral or adverb

In Bulgarian this class is distinct from the class of interrogatives.

Examples:

Dem: demonstrative pronoun, determiner, numeral or adverb

BulTreeBank tagset does not differenciate between pronouns for narness/distance, although in Bulgarian there is such distinction.

Examples

Tot: total (collective) pronoun, determiner or adverb

Examples

Neg: negative pronoun, determiner or adverb

Examples:

Ind: indefinite pronoun, determiner, numeral or adverb

Examples

edit PronType

Reflex: reflexive

Reflex

Boolean feature, typically of pronouns or determiners. It tells whether the word is reflexive, i.e. refers to the subject of its clause.

In Bulgarian the reflexive feature is not encoded as one of the pronoun types, but as a reference type (similarly to entity, attribute, possession, etc.)

In Bulgarian there are reflexive verbs - both as form and as meaning. They are written separately: събуждам се / sabuzhdam se “to wake up”.

Yes: it is reflexive

Note that there is no No value. If the word is not reflexive, the Reflex feature will just not be mentioned in the FEAT column. (Which means that empty value has the No meaning.)

Examples

edit Reflex

Tense: tense

Tense

Tense is a feature that specifies the time when the action took / takes / will take place, in relation to the current moment or to another action in the utterance. In Bulgarian aspect and tense are separate, although not completely independent of each other.

In Bulgarian there are 9 tenses: 3 synthetic and 6 analytic.

Since the feature Tense is assigned to a single word, i.e. it relates to synthetic forms, in Bulgarian it is applicable to only 3 tenses: Present, Aorist and Imperfect.

Past: past tense / preterite / aorist

The past tense denotes actions that happened before the current moment. In Bulgarian, this is aorist. It can be used with both imperfective and perfective verbs.

Examples

Pres: present tense

The present tense denotes actions that are happening right now, that are crossing the moment of speaking or that usually happen. In Bulgarian present tense has a lot of usages: for actual activities (where the perfective verbs are blocked); for historical events, for habitual activities, etc.

Examples

Imp: imperfect

Imperfect is a special case of the past tense. It denotes actions that are happening during some past moment. These actions might continue after the moment of speaking, but also might not, i.e. the evidence is not in the form itself, but it is in the context. Both verbs - perfective and imperfective - are used in imperfect tense.

edit Tense

VerbForm: form of verb or deverbative

Even though the name of the feature seems to suggest that it is used exclusively with verbs, it is not the case. Some verb forms in some languages actually form a gray zone between verbs and other parts of speech (nouns, adjectives and adverbs). For instance, participles may be either classified as verbs or as adjectives, depending on language and context. In both cases VerbForm=Part may be used to separate them from other verb forms or other types of adjectives.

Bulgarian does not have an infinitive. It distinguishes: finite verbs and non-finite verbs (participles and transgressives).

Fin: finite verb

Rule of thumb: if it has non-empty Mood, it is finite. This features is encoded in the following values as second position in verbal tags: Vp# (personal verb); Vn# (impersonal verb); Vx#, Vy# and Vi# (auxiliary verbs).

Examples

Part: participle

Participle is a non-finite verb form that shares properties of verbs and adjectives. The participle in Bulgarian is encoded as c in fifth position of the tag: V#c#.

In Bulgarian there are four types of participles: present active, past perfective active, past imperfective active, past passive. The present active one can be used only adjectively; the past imperfective one can be used only in evidential verb forms; the other have the two usages. The present active can be derived only from imperfective verbs.

Examples

Trans: transgressive

The transgressive, also called adverbial participle, is a non-finite verb form that shares properties of verbs and adverbs. It appears e.g. in Slavic and Indo-Aryan languages.

In Bulgarian it can be derived only from imperfective verbs.

Examples

Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.

edit VerbForm

Voice: voice

Voice

For Indo-European speakers, voice means mainly the active-passive distinction. In other languages, other shades of verb meaning are categorized as voice.

In Bulgarian linguistics there are various theories of Voice distinctions: 2-voice one (active vs. passive), 3-voice one (active vs. passive vs. reflexive), 4-voice one(active vs. passive vs. reflexive vs. impersonal).

Here the 2-voice theory is adopted.

Act: active voice

The subject of the verb is the doer of the action (agent), the object is affected by the action (pacient).

Examples

Pass: passive voice

The subject of the verb is affected by the action (patient). The doer (agent) is either unexpressed or it appears as an object of the verb. In Bulgarian there are two ways of forming passive:

Examples

edit Voice