home issue tracker

This page still pertains to UD version 1.

POS tags

Open class words Closed class words Other
ADJ ADP PUNCT
ADV AUX SYM
INTJ CONJ X
NOUN DET
PROPN NUM
VERB PART
PRON
SCONJ

ADJ: adjective

Definition

Adjectives are words that typically modify nouns and specify their properties or attributes. Adjectives in Slovenian normally agree in gender, case and number with the noun they modify (both in attributive and predicative position), e.g. velik škandal “a big scandal” (masculine nominative singular), v velikih podjetjih “in big companies” (neuter locative plural) and Ponudba je velika. “The offer is big.” (feminine nominative singular).

In accordance with the universal description of ADJ, some words that have traditionally been categorized as numerals in Slovenian are also treated as adjectives, as they display similar morphological and syntactic properties. These include ordinal written numerals (e.g. prvi “the first”, drugi “the second”, tretji, “the third”) and tuples (e.g. enojen “single”, dvojen “double”, trojen “triple”).

In the same way, all adjectival participles are classified as adjectives, regardless of whether they are used as attributes (e.g. prepovedane substance “forbidden substances”), in copula constructions (e.g. kajenje je prepovedano “smoking is forbidden”) or in passive constructions (e.g. to ji je bilo prepovedano “it was forbidden to her”).

Examples

Conversion from JOS

All adjectives are converted to ADJ. In addition to that, some numerals also become ADJ, namely: numerals with Form=letter and Type=ordinal; numeral with Form=letter, Type=ordinal and lemma drug; numerals with Form=letter, Type=special and lemma ending in -en.

edit ADJ

ADP: adposition

Definition

Adposition is a cover term for prepositions and postpositions, however Slovenian only has prepositions. They normally occur before noun phrases to express its grammatical and semantic relation to another unit within a clause.

Adpositions determine the case of the complement phrase, e.g. brez časpopisa “without newspaper” (genitive), k časopisu “to newspaper” (dative), za časopis “for newspaper” (accusative), v časopisu “in newspaper” (locative), s časopisom “with newspaper” (instrumental).

Examples

Conversion from JOS

All prepositions become ADP.

edit ADP

ADV: adverb

Definition

Adverbs are words that typically modify verbs and adjectives for such categories as time, place, direction or manner, e.g. znova začutiti “feel again” or dobro obveščen “well informed”. Adverbs deriving from adjectives can inflect for degree, e.g. zanimivo “interestingly”, zanimiveje/zanimivejše “more interestingly”, najzanimiveje/najzanimivejše “the most interestingly”.

Note that in Slovenian transgressives (adverbial participles) are marked as adverbs, not verbs.

Examples

Conversion from JOS

All adverbs become ADV.

edit ADV

AUX: auxiliary verb

Definition

An auxiliary verb is a verb that accompanies the lexical verb of a verb phrase and expresses grammatical distinctions not carried by the lexical verb, such as person, number, tense, mood, aspect, and voice. In Slovenian, only instances of the verb biti “to be” that accompany lexical verbs are marked as AUX.

Examples

Delimitation

Note that in cases, where biti is used independently as a copula or a content verb, it is marked as verb:

Conversion from JOS

In ssj500k, all instances of verb biti “to be” have been annotated as Type=auxiliary. To separate the actual auxiliary function from other functions, syntax has to be taken into account. Thus, tokens of biti bearing the dependency relation PPart with a main verb become annotated as `AUX˙.

edit AUX

CONJ: coordinating conjunction

Definition

A coordinating conjunction is a word that links words or larger constituents without syntactically subordinating one to the other and expresses a semantic relationship between them.

Examples

Conversion from JOS

All conjunctions with Type=coordinating become CONJ.

edit CONJ

DET: determiner

Definition

Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.

The traditional grammar of Slovenian does not define determiners as a separate word class. Instead, words that perform the syntactic function of determiners are either categorizied as adverbs (nekaj “some”, veliko “a lot of”, dovolj “enough of” etc.) or pronouns (ta “this”, ves “all”, moj “my”, vsak “each” etc.), regardless of whether they are used as attributives (To.DET besedilo je nerazumljivo. “This text is incomprehensible.”) or substantives (To.PRON sem že slišal. “I have heard this before.”).

Conversion from JOS

Since JOS morphosyntactic specifications do not distinguish substantive and attributive pronouns or quantifying and other adverbs, the conversion is done based on syntactic information. The pronouns modifying a noun are thus marked as DET, otherwise they are marked as PRON. Similarly, the list of adverbs modifying a noun was manually validated to define a closed set of quantifying adverbs marked as DET.

Examples

edit DET

INTJ: interjection

Definition

An interjection is a word that is used most often as an exclamation or part of an exclamation. It typically expresses an emotional reaction, is not syntactically related to other accompanying expressions, and may include a combination of sounds not otherwise found in the language. Note that words primarily belonging to another part of speech retain their original category when used in exclamations. For example, odlično “great” is an adverb even in exclamatory uses.

As a special case of interjections, the universal tagging scheme also recognizes feedback particles, such as ja “yes” and ne “no”. Given that most of the feedback signals in Slovenian can also be used as syntactically more or less dependent modal particles (to pa ja veš, ona itak nima pojma, tega ne maram, on je seveda poniknil etc.), they have not been converted to INTJ and remain annotated as particles in the current version of the Slovenian UD Treebank.

Examples

Conversion from JOS.

All interjections become INTJ.

edit INTJ

NOUN: noun

Definition

Nouns are a part of speech typically denoting a person, place, thing, animal or idea. The NOUN tag is intended for common nouns only.

Nouns deriving from verbs (gerunds) are categorized as nouns in Slovenian.

Examples

Conversion from JOS

Nouns with Type=common are converted to NOUN.

edit NOUN

NUM: numeral

Definition

A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.

The NUM tag is used for digit numerals (3 / 3.), roman numerals (III / III.), cardinal word numerals (tri “three”) and generic numerals (troje “three sets of”).

Other word types expressing a number or a relation to a number in Slovenian are marked as adjectives (tretji “the third”, trojen “triple”, trikraten “threefold”), adverbs (trikrat “three times”, tretjič “the third time”) or nouns (tretjina “a third”, trojica “triplet”, trojka “number three”).

Examples

Conversion from JOS

The following numerals are converted to NUM: numerals with Form=digit; numerals with Form=roman; numerals with Form=letter and Type=cardinal; numeral with Form=letter, Type=pronominal and lemma en or eden; and numerals with Form=letter, Type=special and lemma not ending in -en.

edit NUM

PART: particle

Definition

Particles are function words that must be associated with another word or phrase to impart meaning and that do not satisfy definitions of other universal parts of speech (e.g. adpositions, coordinating conjunctions, subordinating conjunctions or auxiliary verbs).

In Slovenian, particles, such as ja “yes” and ne “no”, are always tagged as particles regardless whether they are used as as modal particles (Ne motiš se. “You are not wrong”) or as fedback particles that are not associated with another word or phrase (Ne, motiš se. “No, you are wrong.”).

Examples

Conversion from JOS

All particles are converted to PART.

edit PART

PRON: pronoun

Definition

Within the universal scheme, pronouns are words that substitute for nouns or noun phrases and whose meaning is recoverable from the linguistic or extralinguistic context. Pronouns under this definition function like nouns, which means that the term cannot be extended to words that substitute for adjectives or other POS categories, as is usually the case in Slovenian grammar. Instead, attributive pronouns are tagged as determiners.

For instance, to “this” is traditionally called pronoun in Slovenian grammar, regardless of its syntactic context. To make the annotation parallel across languages, it is now tagged PRON in To sem že slišal. “I have heard this before.” and as DET in To besedilo je nerazumljivo. “This text is incomprehensible.”

Examples

Conversion from JOS

All pronouns are converted to PRON, except for pronouns that function as attributes to nouns (through Attr dependency relation), which are converted to DET.

edit PRON

PROPN: proper noun

Definition

A proper noun is a noun that is the name (or part of the name) of a specific individual, place, or object and is usually written with an initial uppercase letter.

Examples

Conversion from JOS

All nouns with Type=proper are converted to NPROP.

edit PROPN

PUNCT: punctuation

Definition

Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text, including bullets in itemized lists.

Examples

Conversion from JOS

The list of characters in ssj500k treebank has been manually divided into subgroups of PUNCT and SYM. Note that some characters display characteristics of both POS categories, such as asterisk or dash-like characters that can either function as mathematical operators (SYM) or bullets in itemized lists (PUNCT). In case of such ambiguity, the more common function was chosen.

edit PUNCT

SCONJ: subordinating conjunction

Definition

A subordinating conjunction is a conjunction that links constructions by making one of them a constituent of the other. The subordinating conjunction typically introduces a subordinate clause, e.g. Izvedel sem, da me žena vara. “I found out that my wife is cheating on me.”

Examples

Conversion from JOS

All conjunctions with Type=subordinate are converted to SCONJ.

edit SCONJ

SYM: symbol

Definition

A symbol is a word-like entity that differs from ordinary words by form, function, or both. Symbols are distinct from punctuation that delimit linguistic units in printed text and do not have any semantic function.

As opposed to universal guidelines, tokens containing alphanumeric characters, such as URL addresses, email addresses and telephone numbers, are not considered symbols in Slovenian.

Examples

Conversion from JOS

The list of characters in ssj500k treebank has been manually divided into subgroups of PUNCT and SYM. Note that some characters display characteristics of both POS categories, such as asterisk or dash-like characters that can either function as mathematical operators (SYM) or bullets in itemized lists (PUNCT). In case of such ambiguity, the more common function was chosen.

edit SYM

VERB: verb

Definition

A verb is a member of the syntactic class of words that typically signal events and actions, can constitute a minimal predicate in a clause, and govern the number and types of other constituents which may occur in the clause.

In Slovenian, the VERB tag covers all verbs (including content, modal and copula verbs), except for the auxiliary verb biti “to be”, which is tagged as AUX.

Word forms that etymologically derive from verbs, but have different syntactic properties, such as adjectival participles (ukraden “stolen”, pokrit “covered”), transgressives (upoštevaje “taking into account”, začenši “starting”) and gerunds (govorjenje “speaking”, zavrnitev “rejection”, gretje “heating”), are marked as adjectives, adverbs or nouns respectively.

Examples

Conversion from JOS

All verbs with Type=main have been converted to VERB. Additionally, those instances of verb biti with Type=auxiliary that do not bear the PPart dependency relation to a main verb have also been converted to VERB.

edit VERB

X: other

Definition

The X tag is used for words that for some reason cannot be assigned a real part-of-speech category.

In Slovenian UD Treebank, this tag is mostly used for cases of code-switching where it was not meaningful to analyze the intervening language, such as Europe of knowledge, La connaissance de soi, Bundesvereinigung det Deutschen Arbeitgeberverbände. In cases where foreign-language sequences include both foreign and loan words, only foreign words are assigned the X tag, as in The Life of Brian, where both Life and Brian are marked as NOUN and PROPN respectively.

Other subcategories marked with X include abbreviations with dots (dr.), URL addresses (www.radenska.si), news author abbreviations (sta) and tokens with alpha-numerical combinations (6230i).

Conversion from JOS

All tokens with tag Residual are converted to X. Additionally, all abreviations are also converted to X.

edit X