POS tags
Open class words | Closed class words | Other |
---|---|---|
ADJ | ADP | PUNCT |
ADV | AUX | SYM |
INTJ | CONJ | X |
NOUN | DET | |
PROPN | NUM | |
VERB | PART | |
PRON | ||
SCONJ |
ADJ
: adjective
Definition
Adjectives are words that typically modify nouns and specify their properties or attributes. They may also function as predicates, as in
Example: [bg] Колата е зелена / Kolata e zelena (The car is green.)
The ADJ
tag is intended for ordinary adjectives only. See DET
for determiners and NUM for numerals.
In Bulgarian the words that map to the ADJ
tag from the BulTreeBank tagset are:
- A# (adjective)
Example: [bg] добър / dobar (good) 7-годишен / 7-godishen (seven-years-old)
- H# (family name adjective)
Example: [bg] Иванова книга / Ivanova kniga (Ivan’s book)
- Mo# (ordinal numeral)
Example: [bg] втори / vtori (second)
- V#car# (present participle)
Example: [bg] идващ / idvasht (coming)
- V#cv# (past passive participle)
Example: [bg] намерен / nameren (found)
- V#cao# (past perfective participle)
Example: [bg] направил / napravil (made)
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
ADP
: adposition
Definition
Adposition in Bulgarian is a preposition.
In the BulTreeBank tagset it is encoded as R
.
Thus, the preposition comes before a complement composed of a noun
phrase, noun, pronoun, or clause that functions as a noun phrase, and
forms a single structure with the complement to express its
grammatical and semantic relation to another unit within a clause.
Examples
- в / v “in”
- към / kam “to”
In Bulgarian, prepositions can take the form of fixed multiword expressions, such as in spite of, because of, thanks to in English. They are treated as multiwords in the treebank, but for the purposes of mapping to this level, they are treated compositionally.
Examples
- по време на / po vreme na “during”
is analyzed as
- по / po “in”
- време / vreme “time”
- на / na “of”
ADV
: adverb
Definition
In the group of Bulgarian adverbs there are words that typically modify verbs for such categories as time, place, direction or manner. They may also modify adjectives and other adverbs, as in very briefly or arguably wrong. Some adverbs can modify even [nouns] (Noun).
In BulTreeBank tagset the corresponding POS tag is D
.
There is a closed subclass of pronominal adverbs that refer to
circumstances in context, rather than naming them directly; similarly
to pronouns, these can be categorized as interrogative, relative,
demonstrative etc. Pronominal adverbs also get the ADV
part-of-speech tag but they are differentiated by additional features.
In the BulTreeBank tagset the corresponding tags are as follows:
- Pdl, Pdm, Pdq, Pdt (Adverbial demonstrative pronouns for location, manner, quantity and time)
- Prl, Prm, Prq, Prt (Adverbial relative pronouns for location, manner, quantity and time)
- Pcl, Pcm, Pct (Adverbial collective pronouns for location, manner and time)
- Pil, Pim, Piq, Pit (Adverbial interrogative pronouns for location, manner, quantity and time)
- Pfl, Pfm, Pfq, Pft (Adverbial indefinite pronouns for location, manner, quantity and time)
- Pnl, Pnm, Pnq, Pnt (Adverbial negative pronouns for location, manner, quantity and time)
Examples
- demonstrative adverbs: тук, там, тогава / tuk, tam, togava “here, there, then”
- relative pronouns: когато, където, както, колкото / kogato, kadeto, kakto, kolkoto “when, where, as, as much as”
- collective adverbs: навсякъде, всякога, всякак / navsyakade, vsyakoga, vsyakak “everywhere, always, anyway”
- interrogative adverbs: кога, къде, как, колко /koga, kade, kak, kolko “when, where, how, how many”
- indefinite adverbs: някъде, някога, някак / nyakade, nyakoga, nyakak “somewhere, sometime, somehow”
- negative adverbs: никога, никъде, никак / nikoga, nikade, nikak “never, nowhere, not at all”
Note that there are words that may be traditionally called numerals in
some languages (e.g. Bulgarian) but they are treated as adverbs in the
universal tagging scheme. In particular, adverbial ordinal numerals
([bg] първо / parvo “for the first time”) are tagged ADV
.
The mapped tags present the neuter singular indefinite forms of the ordinal numerals: Monsi
.
In this way there will be ambiguity with the class of [adjectives] (ADJ).
Another adverbial numeral that goes under ADV
is Md#:
Examples
- много / mnogo “very”
- малко /malko “little”
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
AUX
: auxiliary verb
Definition
An auxiliary verb is a verb that accompanies the lexical verb of a verb phrase and expresses grammatical distinctions not carried by the lexical verb, such as person, number, tense, mood, aspect, and voice.
In Bulgarian the auxiliary verbs are varieties of the verb ‘to be’:
- Vx# / съм / sam “to be”
- Vy# /бъда / bada “to be”
- Vi# / бивам / bivam “to be”
Modal verbs count as main verbs in BulTreeBank tagset and they are thus tagged VERB
.
Examples
- Tense and passive auxiliaries: бях / byah “I was”
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
CONJ
: coordinating conjunction
Definition
A coordinating conjunction is a word that links words or larger constituents without syntactically subordinating one to the other and expresses a semantic relationship between them.
In BulTreeBank tagset there are three types of conjunctions:
Cc (single coordinating conjunction)
Examples
- но / no “but”
Cr (repetitive conjunction). These usually contain at least two parts.
Examples
- хем…, хем… / hem…, hem… “either…. or….”
Cp (single and repetitive conjunction). These usually are used as singletons, but they also might be used in a repetitive chain.
Examples
- и / i “and”
DET
: determiner
Definition
Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.
In Bulgarian the definite article is part of the word, thus it is not considered as determiner.
However, the following pronouns are mapped to detereminers:
- demonstratives: Pda#, Pde#
- relatives: Pra#, Pre#, Prp#
- collectives: Pca#, Pce#
- interrogatives: Pia#, Pie#, Piy#, Pip#
- indefinites: Pfa#, Pfe#, Pfp#
- negatives: Pna#, Pne#, Pnp#
- possessives: Ps@l
Note that the attributive usages (#a#) and possessive attributive usages (#p#) go directly into DET category, while entities (#e#) can be either determiners or pronouns. The possessive pronouns (Ps#) are mapped with only their long forms (#l#). The short forms are clitics and will be treated differently.
Examples
- possessive determiners: мой / moy “my”, твой / tvoy “your”
- demonstrative determiners: тази / _tazi__ “this” as in _Вчера видях тази кола / Vchera vidyah tazi kola “I saw this car yesterday.”
- interrogative determiners: какъв / kakav “which.MASC.SG”
- relative determiners: какъвто / kakavto “which.MASC.SG”
- indefinite determiners: някакъв / nyakakav “some.MASC.SG”
- collective determiners: всякакъв / vsyakakav “any.MASC.SG”
- negative determiners: никакъв / nikakav “no.MASC.SG”
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
The symbol `@’ marks the suppresion with one feature in the tag.
INTJ
: interjection
Definition
An interjection is a word that is used most often as an exclamation or part of an exclamation. It typically expresses an emotional reaction, is not syntactically related to other accompanying expressions, and may include a combination of sounds not otherwise found in the language.
Note that words primarily belonging to another part of speech retains their original category when used in exclamations. For example, God is a NOUN even in exclamatory uses.
In BulTreeBank annotation scheme the interjections are tagged as I
.
These include the following groups: exclamations and onomatopoeic words.
In cases like God, the lemmas are tagged as both: Noun
and Interjection
.
Thus, only the Noun usage will be kept in the universal setting.
The feedback particles, such as: yes and no are mapped to INTJ with the labels:
- Ta (affirmative) - да / da “yes”
- Tn (negative) - не / не “no”
NOUN
: noun
Definition
Nouns are a part of speech typically denoting a person, place, thing, animal or idea.
The NOUN
tag is intended for common nouns only.
In Bultreebank the common nouns are annotated with the tag Nc#
.
Examples
- момиче / momiche “girl”
- котка / kotka “cat”
- дърво / darvo “tree”
- въздух / vazduh “air”
- красота / krasota “beauty”
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
NUM
: numeral
Definition
A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.
Note that cardinal numerals are covered by NUM
whether they are used
as determiners or not (as in Windows Seven) and whether they
are expressed as words (four), digits (4) or Roman numerals
(IV). Other words functioning as determiners (including quantifiers
such as many and few) are tagged DET.
In Bultreebank tagset the tag which maps to NUM
, is Mc#
.
Examples
- 0, 1, 2, 3, 4, 5, 2014, 1000000, 3.14159265359
- едно, две, три, седемдесет и седем / edno, dve, tri, sedemdeset i sedem “one, two, three, seventy-seven”
- I, II, III, IV, V, MMXIV
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
PART
: particle
Definition
Particles are function words that must be associated with another word or phrase to impart meaning and that do not satisfy definitions of other universal parts of speech (e.g. adpositions, coordinating conjunctions, subordinating conjunctions or auxiliary verbs). Particles may encode grammatical categories such as negation, mood, tense etc. Particles are normally not inflected, although exceptions may occur.
In the Bultreebank tagset the following tags map to PART
: Tn, Ti, Tx, Tm, Tv, Te and Tg.
(Note that Ta
is considered INTJ
in the universal tagset. The Tn
particle не / ne “no” is also considered INTJ
).
Examples
- negative particle (Tn): нито / nito “neither”
- interrogative particles(Ti): ли / li“question particle”
- auxiliary particles (Tx): да, ще / da, shte “to, will”
- modal particles (Tm): май / may “possibly”
- verbal particles (Tv): нека / neka “let”
- emphasis particles (Te): даже / dazhe “even”
- gradable particles (Tg): най / nay “most”
PRON
: pronoun
Definition
Pronouns are words that substitute for nouns or noun phrases, whose meaning is recoverable from the linguistic or extralinguistic context.
Pronouns under this definition function like nouns.
The BulTreeBank annotation scheme adopted the idea that pro-nouns, pro-adjectives and pro-adverbs are labeled as Pronouns. However, for the mapping to the universal tagset, this group was split into several parts-of-speech: determiners, adverbs, pronouns.
The tags that correspond to Pron
are: Pp#, Pde#, Pre#, Pce#, Pie#, Pfe#, Pfy#, Pne#, Piy#.
Examples
- personal pronouns (Pp#): аз, ти, той, тя, то, ние, те, него, го / az, ti, toy, tya, to, nie, te, nego, go “I, you, he, she, it, we, they, him.LONG.FORM, him.SHORT.FORM”
- reflexive personal pronouns (Ppx#): себе си, се / sebe si, se “myself, yourself, himself, herself, itself, ourselves, yourselves, theirselves”
- demonstrative pronouns (Pde#): този, това / tozi, tova “this” as in Видях това вчера. / Vidyah tova vchera “I saw this yesterday.”
- interrogative pronouns (Pie#, Piy#): кой, какво, колцина / _koy, kakvo, koltsina “who, what, how many” as in Какво мислиш? / _Kakvo mislish? “What do you think?”
- relative pronouns (Pre#): който, каквото / _koyto, kakvoto “who, what” as in Човекът, който дойде, е баща ми. /Chovekat, koyto doyde, e bashta mi “The person who came is my father.”
- indefinite pronouns (Pfe#, Pfy#): някой, нещо, неколцина / nyakoy, neshto, nekoltsina “somebody, something, some, anybody, anything, any”
- collective pronouns (Pce#): всеки, всичко / vseki, vsichko “everybody, everything”
- negative pronouns (Pne#): никой, нищо / nikoy, nishto “nobody, nothing”
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
PROPN
: proper noun
Definition
A proper noun is a noun (or nominal content word) that is the name (or part of the name) of a specific individual, place, or object.
The tagPROPN
maps to the following tags in the BulTreeBank scheme: Np#
, H#
The tag Np#
refers to proper nouns, while the tag H#
handles two cases: 1. family names, which are
mapped here as proper nouns, and 2. name adjectives, which are mapped to adjectives.
Examples
- Мария, Иван / Maria, Ivan “Mary, John”
- София, Холандия / Sofiya, Holandia “Sofia, Holland”
- Иванов, Петрова / Ivanov, Petrova as in господин Иванов, госпожа Петрова / gospodin Ivanov, gospozha Petrova_ “Mr. Ivanov, Ms. Petrova”
- Клинтън / “Klintan” “Clinton”
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
PUNCT
: punctuation
Definition
Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.
Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.
The punctuation in the BulTreeBank scheme follows the same principles as in the Universal tagset. Thus, it includes: period, comma, colon, etc.
Examples
- Точка / Tochka “Period”: .
- Запетая / Zapetaya “Comma”: ,
- Скоби / Skobi “Parentheses”: ()
SCONJ
: subordinating conjunction
Definition
A subordinating conjunction is a conjunction that links constructions by making one of them a constituent of the other. The subordinating conjunction typically marks the incorporated constituent which has the status of a (subordinate) clause.
For coordinating conjunctions, see CONJ.
In the BulTreeBank annotation scheme, our tag Cs
is mapped to the universal one SCONJ.
According to our scheme the multiword subordinate conjunctions are also labeled as Cs
.
So, such cases are excluded here and analysed compositionally, as recommended in the universal tagset.
Examples
- ако / ako “if”
- щом / shtom “as soon as”
- докато / dokato “while”
SYM
: symbol
This document is a placeholder for the language-specific documentation
for SYM
.
VERB
: verb
Definition
A verb is a member of the syntactic class of words that typically signal events and actions, can constitute a minimal predicate in a clause, and govern the number and types of other constituents which may occur in the clause. Verbs are often associated with grammatical categories like tense, mood, aspect and voice, which can either be expressed inflectionally or using auxilliary verbs or particles.
The BulTreeBank annotation scheme provides the following mappings here: main verbs, copulas and modal verbs.
Note that modal verbs do not have special labels in our annotation scheme.
Participles and gerund are considered also VERB
. Below the specific labels that map to VERB
are given.
Examples
- Vp# (finite verb): тичам / ticham “run”
- Vn# (impersonal verb): вали, трябва / vali, tryabva “It rains, must”
- Vx# (the copula to be): съм / sam “to be”
- Vy# (the copula to be): бъда / bada “to be”
- Vi# (the copula to be): бивам / bivam “to be”
- V#cv# (past passive participle): намерен / nameren “found”. It is also mapped to ADJ in its attributive usages.
- V#cam# (past imperfective participle): четял / chetyal “He was reading”
- V#cao# (past perfective participle): дошъл / doshal “He has come”. It is also mapped to ADJ in its attributive usages.
- V#g (gerund): Идвайки / idvayki “Coming”
Note that the present active participle V#car# is mapped only to ADJ.
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
X
: other
This document is a placeholder for the language-specific documentation
for X
.