POS tags
Open class words | Closed class words | Other |
---|---|---|
ADJ | ADP | PUNCT |
ADV | AUX | SYM |
INTJ | CONJ | X |
NOUN | DET | |
PROPN | NUM | |
VERB | PART | |
PRON | ||
SCONJ |
ADJ
: adjective
Definition
Adjectives are words that typically modify nouns and specify their properties or attributes. Adjectives in Slovenian normally agree in gender, case and number with the noun they modify (both in attributive and predicative position), e.g. velik škandal “a big scandal” (masculine nominative singular), v velikih podjetjih “in big companies” (neuter locative plural) and Ponudba je velika. “The offer is big.” (feminine nominative singular).
In accordance with the universal description of ADJ
, some words that have traditionally been categorized as numerals in Slovenian are also treated as adjectives, as they display similar morphological and syntactic properties. These include ordinal written numerals (e.g. prvi “the first”, drugi “the second”, tretji, “the third”) and tuples (e.g. enojen “single”, dvojen “double”, trojen “triple”).
In the same way, all adjectival participles are classified as adjectives, regardless of whether they are used as attributes (e.g. prepovedane substance “forbidden substances”), in copula constructions (e.g. kajenje je prepovedano “smoking is forbidden”) or in passive constructions (e.g. to ji je bilo prepovedano “it was forbidden to her”).
Examples
- star “old”, zelen “green”, nerazumljiv “incomprehensible”
- človekov “human’s”, Nobelov “Nobel’s”, kalcijev “calcium’s”
- znan “known”, zaposlen “employed”, povezan “connected”
- prvi “first”, drugi “second”, tretji “third”
- enojen “single”, dvojen “double”, trojen “triple”
Conversion from JOS
All adjectives are converted to ADJ
. In addition to that, some numerals also become ADJ
, namely: numerals with Form=letter and Type=ordinal; numeral with Form=letter, Type=ordinal and lemma drug; numerals with Form=letter, Type=special and lemma ending in -en.
ADP
: adposition
Definition
Adposition is a cover term for prepositions and postpositions, however Slovenian only has prepositions. They normally occur before noun phrases to express its grammatical and semantic relation to another unit within a clause.
Adpositions determine the case of the complement phrase, e.g. brez časpopisa “without newspaper” (genitive), k časopisu “to newspaper” (dative), za časopis “for newspaper” (accusative), v časopisu “in newspaper” (locative), s časopisom “with newspaper” (instrumental).
Examples
- iz “from”, do “to”, zaradi “because of”
- k “to”, proti “against”, kljub “despite”
- za “for”, na “on”, v “in”
- po “after”, o “about”, pri “at”
- z “with”, med “between”, pred “before”
Conversion from JOS
All prepositions become ADP
.
ADV
: adverb
Definition
Adverbs are words that typically modify verbs and adjectives for such categories as time, place, direction or manner, e.g. znova začutiti “feel again” or dobro obveščen “well informed”. Adverbs deriving from adjectives can inflect for degree, e.g. zanimivo “interestingly”, zanimiveje/zanimivejše “more interestingly”, najzanimiveje/najzanimivejše “the most interestingly”.
Note that in Slovenian transgressives (adverbial participles) are marked as adverbs, not verbs.
Examples
- vedno “always”, tako “like that”, zelo “very”
- sodeč “judging”, upoštevaje “taking into account”, molče “silently”
Conversion from JOS
All adverbs become ADV
.
AUX
: auxiliary verb
Definition
An auxiliary verb is a verb that accompanies the lexical verb of a verb phrase and expresses grammatical distinctions not carried by the lexical verb, such as person, number, tense, mood, aspect, and voice. In Slovenian, only instances of the verb biti “to be” that accompany lexical verbs are marked as AUX
.
Examples
- Tistega večera sem.
AUX
preveč popil.VERB
. “I drank too much that evening.” - V bolnišnici bodo.
AUX
uvedli.VERB
šolo za starše. “A parenting school will be introduced in the hospital.” - Kam bi.
AUX
se lahko zatekla.VERB
? “Where could she have hidden?”
Delimitation
Note that in cases, where biti is used independently as a copula or a content verb, it is marked as verb:
- To je.
VERB
grozno. “This is horrible.” - Za nami je.
VERB
dolga vrsta. “There is a long queue behind us.” - Vsi smo.
AUX
bili.VERB
zadovoljni. “We were all content.”
Conversion from JOS
In ssj500k, all instances of verb biti “to be” have been annotated as Type=auxiliary. To separate the actual auxiliary function from other functions, syntax has to be taken into account. Thus, tokens of biti bearing the dependency relation PPart with a main verb become annotated as `AUX˙.
CONJ
: coordinating conjunction
Definition
A coordinating conjunction is a word that links words or larger constituents without syntactically subordinating one to the other and expresses a semantic relationship between them.
Examples
- in, pa, ter “and”
- ali, oziroma “or”
- vendar “however”; toda, ampak “but”
- namreč “namely”
- saj “as/since”
Conversion from JOS
All conjunctions with Type=coordinating become CONJ
.
DET
: determiner
Definition
Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.
The traditional grammar of Slovenian does not define determiners as a separate word class. Instead, words that perform the syntactic function of determiners are either categorizied as adverbs (nekaj “some”, veliko “a lot of”, dovolj “enough of” etc.) or pronouns (ta “this”, ves “all”, moj “my”, vsak “each” etc.), regardless of whether they are used as attributives (To.DET
besedilo je nerazumljivo. “This text is incomprehensible.”) or substantives (To.PRON
sem že slišal. “I have heard this before.”).
Conversion from JOS
Since JOS morphosyntactic specifications do not distinguish substantive and attributive pronouns or quantifying and other adverbs, the conversion is done based on syntactic information. The pronouns modifying a noun are thus marked as DET
, otherwise they are marked as PRON. Similarly, the list of adverbs modifying a noun was manually validated to define a closed set of quantifying adverbs marked as DET
.
Examples
- njegov “his”, njen “her”, naš “our”, njihov “their”, _moj “my”, _vaš “your” etc. (JOS possessive pronouns)
- ta “this”, tisti “that”, takšen “such”, tak “such”, _tolikšen “so big” etc. (JOS demonstrative pronouns)
- ves “all”, vsak “each”, oba “both”, vsakršen “any” (JOS general pronouns)
- svoj “one’s own” (JOS reflexive pronouns)
- nekateri “some”, nek “some kind”, isti “identical”, enak “same”, mnog “many” (JOS indefinite pronouns)
- kakšen “what kind”, kateri “what type”, čigav “whose” (JOS interrogative pronouns)
- noben “no one”, nikakršen “no kind”, nič “nothing” (JOS negative pronouns)
- kakršenkoli “any kind of”, katerikoli “any type of”, čigar “whose” (JOS relative pronouns)
- nekaj “some”, več “more”, veliko “a lot of”, dovolj “enough of”, pol “half of”, malo “little of” (JOS adverbs)
INTJ
: interjection
Definition
An interjection is a word that is used most often as an exclamation or part of an exclamation. It typically expresses an emotional reaction, is not syntactically related to other accompanying expressions, and may include a combination of sounds not otherwise found in the language. Note that words primarily belonging to another part of speech retain their original category when used in exclamations. For example, odlično “great” is an adverb even in exclamatory uses.
As a special case of interjections, the universal tagging scheme also recognizes feedback particles, such as ja “yes” and ne “no”. Given that most of the feedback signals in Slovenian can also be used as syntactically more or less dependent modal particles (to pa ja veš, ona itak nima pojma, tega ne maram, on je seveda poniknil etc.), they have not been converted to INTJ
and remain annotated as particles in the current version of the Slovenian UD Treebank.
Examples
- ah, oh, ha
- zbogom, živijo, adijo
- jebemti, bravo, ups
Conversion from JOS.
All interjections become INTJ
.
NOUN
: noun
Definition
Nouns are a part of speech typically denoting a person, place, thing, animal or idea. The NOUN
tag is intended for common nouns only.
Nouns deriving from verbs (gerunds) are categorized as nouns in Slovenian.
Examples
- čas “time”, dan “day”, človek “human”
- leto “year”, delo “work”, mesto “city”
- država “country”, stran “page”, ura “hour”
Conversion from JOS
Nouns with Type=common are converted to NOUN
.
NUM
: numeral
Definition
A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.
The NUM
tag is used for digit numerals (3 / 3.), roman numerals (III / III.), cardinal word numerals (tri “three”) and generic numerals (troje “three sets of”).
Other word types expressing a number or a relation to a number in Slovenian are marked as adjectives (tretji “the third”, trojen “triple”, trikraten “threefold”), adverbs (trikrat “three times”, tretjič “the third time”) or nouns (tretjina “a third”, trojica “triplet”, trojka “number three”).
Examples
- 1, 2, 3
- 1., 2., 3.
- I, II, III
- I., II., III.
- en “one”, dva “two”, tri “three”
- enoje “one set of”, dvoje “two sets of”, troje “three sets of”
Conversion from JOS
The following numerals are converted to NUM
: numerals with Form=digit; numerals with Form=roman; numerals with Form=letter and Type=cardinal; numeral with Form=letter, Type=pronominal and lemma en or eden; and numerals with Form=letter, Type=special and lemma not ending in -en.
PART
: particle
Definition
Particles are function words that must be associated with another word or phrase to impart meaning and that do not satisfy definitions of other universal parts of speech (e.g. adpositions, coordinating conjunctions, subordinating conjunctions or auxiliary verbs).
In Slovenian, particles, such as ja “yes” and ne “no”, are always tagged as particles regardless whether they are used as as modal particles (Ne motiš se. “You are not wrong”) or as fedback particles that are not associated with another word or phrase (Ne, motiš se. “No, you are wrong.”).
Examples
- tudi “also”
- ne “no/not”
- še “yet/more”
- že “already”
- le “only/just”
- naj “should”
- samo “only”
- prav “ok/just”
- predvsem “mostly”
- sicer “else”
- celo “even”
- seveda “of course”
Conversion from JOS
All particles are converted to PART
.
PRON
: pronoun
Definition
Within the universal scheme, pronouns are words that substitute for nouns or noun phrases and whose meaning is recoverable from the linguistic or extralinguistic context. Pronouns under this definition function like nouns, which means that the term cannot be extended to words that substitute for adjectives or other POS categories, as is usually the case in Slovenian grammar. Instead, attributive pronouns are tagged as determiners.
For instance, to “this” is traditionally called pronoun in Slovenian grammar, regardless of its syntactic context. To make the annotation parallel across languages, it is now tagged PRON
in To sem že slišal. “I have heard this before.” and as DET
in To besedilo je nerazumljivo. “This text is incomprehensible.”
Examples
- jaz “me”, ti “you”, on “he”
- oba “both”, ves “all”, vsak “anyone”, vsakdo “anyone”
- ta “this one”, tale “this one”, tisti “that one”
- nič “nothing”, nihče “nobody”, nobeden “noone”
- kar “which”, karkoli “anything”, kdor “who”
- se “oneself”
- moj “mine”, tvoj “yours”, njihov “theirs”
- kaj “what”, kdo “who”
- nekaj “something”, nekdo “somebody”, malokdo “not a lot of people”
Conversion from JOS
All pronouns are converted to PRON
, except for pronouns that function as attributes to nouns (through Attr dependency relation), which are converted to DET
.
PROPN
: proper noun
Definition
A proper noun is a noun that is the name (or part of the name) of a specific individual, place, or object and is usually written with an initial uppercase letter.
Examples
- Slovenija “Slovenia”
- Evropa “Europe”
- ZDA “USA”
- Janez (personal names)
- Olimpija (name of sport club)
- Krka (name of river and company)
Conversion from JOS
All nouns with Type=proper are converted to NPROP
.
PUNCT
: punctuation
Definition
Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text, including bullets in itemized lists.
Examples
- . ? ! …
- , : ;
- ( ) { [ ]
- ” » « ‘ “ ” ’ ‘
- / _
- • * —
Conversion from JOS
The list of characters in ssj500k treebank has been manually divided into subgroups of PUNCT
and SYM. Note that some characters display characteristics of both POS categories, such as asterisk or dash-like characters that can either function as mathematical operators (SYM
) or bullets in itemized lists (PUNCT
). In case of such ambiguity, the more common function was chosen.
SCONJ
: subordinating conjunction
Definition
A subordinating conjunction is a conjunction that links constructions by making one of them a constituent of the other. The subordinating conjunction typically introduces a subordinate clause, e.g. Izvedel sem, da me žena vara. “I found out that my wife is cheating on me.”
Examples
- da “that”
- ki “which”
- kot “like”
- ko “when”
- če “if”
- ker “because”
- kjer “where”
- ko “when”
- čeprav “even_though”
- kakor “as”
Conversion from JOS
All conjunctions with Type=subordinate are converted to SCONJ
.
SYM
: symbol
Definition
A symbol is a word-like entity that differs from ordinary words by form, function, or both. Symbols are distinct from punctuation that delimit linguistic units in printed text and do not have any semantic function.
As opposed to universal guidelines, tokens containing alphanumeric characters, such as URL addresses, email addresses and telephone numbers, are not considered symbols in Slovenian.
Examples
- $, %, °, µ
- #, @, ©, &
- +, ×, =, <, >
Conversion from JOS
The list of characters in ssj500k treebank has been manually divided into subgroups of PUNCT
and SYM
. Note that some characters display characteristics of both POS categories, such as asterisk or dash-like characters that can either function as mathematical operators (SYM
) or bullets in itemized lists (PUNCT
). In case of such ambiguity, the more common function was chosen.
VERB
: verb
Definition
A verb is a member of the syntactic class of words that typically signal events and actions, can constitute a minimal predicate in a clause, and govern the number and types of other constituents which may occur in the clause.
In Slovenian, the VERB
tag covers all verbs (including content, modal and copula verbs), except for the auxiliary verb biti “to be”, which is tagged as AUX.
Word forms that etymologically derive from verbs, but have different syntactic properties, such as adjectival participles (ukraden “stolen”, pokrit “covered”), transgressives (upoštevaje “taking into account”, začenši “starting”) and gerunds (govorjenje “speaking”, zavrnitev “rejection”, gretje “heating”), are marked as adjectives, adverbs or nouns respectively.
Examples
- imeti “to have”, vedeti “to know”, dobiti “to get”
- morati “to must”, moči “to be able to”, postati “to become”
- začeti “to start, iti “to go”, priti “to come”
Conversion from JOS
All verbs with Type=main have been converted to VERB
. Additionally, those instances of verb biti with Type=auxiliary that do not bear the PPart dependency relation to a main verb have also been converted to VERB
.
X
: other
Definition
The X
tag is used for words that for some reason cannot be assigned a real part-of-speech category.
In Slovenian UD Treebank, this tag is mostly used for cases of code-switching where it was not meaningful to analyze the intervening language, such as Europe of knowledge, La connaissance de soi, Bundesvereinigung det Deutschen Arbeitgeberverbände. In cases where foreign-language sequences include both foreign and loan words, only foreign words are assigned the X
tag, as in The Life of Brian, where both Life and Brian are marked as NOUN and PROPN respectively.
Other subcategories marked with X
include abbreviations with dots (dr.), URL addresses (www.radenska.si), news author abbreviations (sta) and tokens with alpha-numerical combinations (6230i).
Conversion from JOS
All tokens with tag Residual are converted to X
. Additionally, all abreviations are also converted to X
.