Universal Dependencies for Pomak
Tokenization and Word Segmentation
- In general, words are delimited by whitespace characters and every single token is segmented separately. For example: za da “in order to”, ní kutrí “no one”, zablǽvom so “slow myself”.
- The numbers are analyzed as one token when used as expression without spaces (20000) or with an internal comma as indicator (10,434).
Morphology
Tags
This is an overview only. For a more detailed discussion and examples, see the list of Pomak POS tags and Pomak features.
- Pomak treebank uses 16 universal POS categories. Currently it does not make use of SYM.
- Affirmative, negative, interrogative, modal particles are analyzed as PART.
- PRON are distinguished from DET as follows:
- Both the strong and the weak types of personal pronouns (ja/mí/mó, ty/tí/tó, toj/mú/go, tja/jí/jé, to/mú/go, nýje/nú/nú, výje/vú/vú, tíje/mí/gi, to/mí/gi) are assigned the PoS tag PRON.
- The weak types of possessive pronouns (mi, ti, mu, ji, nu, vu, mi) are assigned the PoS tag PRON.
- The reflexive pronouns
sá/sé/só
καιsí
are assigned the PoS tag PRON. - Then pronoun types kaná “what”, kanása / kanáta / kanána “whatever here - proximal / whatever there -medial / whatever over there - distal”, ní kaná “nothing” and síčko / síčkoso / síčkoto / síčkono “all / everything” are assigned the PoS tag PRON.
- The strong types of possessive pronouns (moj, tvoj, tógav, tójin, naš, vaš, tǽhan) and all other pronouns are assigned the PoS tag DET.
- The adjective adín/edín/idín is assigned the PoS tag DET when it is used as an indefinite article.
- The PoS tag AUX is assigned to the following lemmas (and their clitic paradigms, where it applies):
* som “to be” -but lemmas such as býdom are also possible-
- šom / štom (it expresses possibility, very similartly to the Greek θα)
- še / ša “will, shall”, da “to”
- interrogative li / dalí, e.g., ažónen li si? “are you married?” [lit. “married you?”]
- Modal verbs are assigned the PoS tag VERB.
- The PoS tag ADJ is assigned to adjectives, ordinal numerals, adjectives derived from family names and ethnonyms.
- The PoS tag VERB is assigned to personal and impersonal verbs, participles, infinitives and converbs.
- Note: Pomak has a triple post-positioned definite article which is part of the word that receives no distinct PoS tag.
Features
Nominal Features
- Common nouns (NOUN) and proper nouns (PROPN) have inherent gender (Gender) that receives one of the following values:
Masc
,Fem
ήNeut
. - Animacy is a grammatical feature of pronouns, adjectives, participles and some of the numerals. The opposition
Human vs. Non-human
is overt with masculine plural and rarely with masculine singular. - ADJ, DET, NUM, the participles that are assigned the PoS tag VERB and PRON inflect for Case, gender Gender and number Number and agree with the nouns that they modify.
- Note: Certain adjectives (of turkish origins mainly) do not inflect.
- Pomak has four cases: Nominative, Genitive, Accusative and Vocative.
- Pomak nouns, adjectives, certain numerals, passive participles and the strong types of pronouns may be marked for the feature Definite. When an article is attached to them they are assigned the value
Def
else the valueInd
. - Pomak has a triple enclitic definite artile (-s, -t, -n) that occurs with nouns, adjectives, pronouns and passive participles and denotes deixis and definiteness. The features Deixis and DeixisRef are used to tag deixis as follows:
- Proximity to the speaker is denoted with the values
Prox
and1
respectively (e.g., čulǽkos, žanása, déteso). - Proximity to the listener is denoted with the values
Prox
and2
respectively (e.g., čulǽkot, žanáta, déteto). - Distance from both the speaker and the listener is denoted with the value
Remt
(e.g., čulǽkon, žanána, déteno).
- Proximity to the speaker is denoted with the values
Degree and Polarity
- The comparative and superlative degree of adjectives and adverbs is formed with the adverbs po and naj respectively: they both are distinct words. Τhe feature Degree is used to denote the positive, comparative and superlative degre of adjectives and adverbs and is assigned one of the values
Pos
,Cmp
καιSup
respectively. Only the comparative and the superlative degree have been declared so far while the positive degree is treated as the default. - Polarity has two values,
Pos
andNeg
, and applies primarily to negative and affirmative particles PART. So far, only the valueNeg
has been used.
Verbal Features
- Similarly to other Slavic languages, Pomak verbs are marked for Aspect as a lexically classifying feature, that takes one of the following four values:
Imp
,Perf
,Iter
,Prog
. - Finite verbs always are marked for one of two values of Mood:
Ind
orImp
, one of four values of Number:Sing
,Plur
,Count
orColl
and one of three values of Person:1
,2
or3
. - Verbs in the
Ind
mood are always marked forone of two values of Tense:Past
orPres
.Fut
is not used because this tense is always formed with a special particle. - Finite verbs of the
Imp
mood have only 2nd Person. - There are two values of the Voice feature:
Act
andPass
. Only the passive participle hasVoice=Pass
. All other verb forms haveVoice=Act
. - There are three types of nonfinite verb forms (VerbForm):
Conv
,Inf
andPart
.
Pronouns, Determiners, Quantifiers
- PronType is used with pronouns (PRON), determiners (DET) and adverbs (ADV).
- NumType is used with numerals (NUM), adjectives (ADJ).
- The Poss feature marks possessive personal determiners (e.g. moj “my”), possessive interrogative (indefinite or negative) determiners (e.g., číjje “whose”), possessive relative determiners (e.g., číjjeso, číjjeto, číjjeno “whose”) and possessive adjectives (e.g., májčin “mother’s”).
- Note: Indefinite, negative and universal pronouns (e.g., ní kutrí “no one”) and indefinite, negative and universal adverbs (e.g., ní kadé “no where”) are formed with the particles
nǽ / nó, ní, sǽ
and the corresponding interogative pronoun. The particles precede the pronouns and retain both their word status and the featurePronType=Int
.
- Note: Indefinite, negative and universal pronouns (e.g., ní kutrí “no one”) and indefinite, negative and universal adverbs (e.g., ní kadé “no where”) are formed with the particles
- The Reflex feature is assigned to reflexive pronouns
sá, sé, só
and possesive clitic pronounsí
. - Person is a lexical feature of personal pronouns (PRON) and personal determiners (DET) and has three values:
1
,2
and3
. - There is one layered feature, namely Number[psor]. It appears with the possesive determiners and encodes the lexical number of the possessor. The extra layer is needed to distinguish this lexical feature from the inflectional number that marks agreement with the modified (possessed) noun.
Other Features
-
Language-specific features of Pomak:
-
Diminutive and augmentative forms of nouns, adjectives, adverbs and certain passive participles are assigned the feature qpm-DegreeMod and one of the values
Dim
ήMag
respectively. -
Τhe feature Variant with the value
Short
is assigned to the weak types of personal and possessive pronouns to set them apart from their corresponding strong types. -
Τhe particles
nǽ
/nó
,ní
,sǽ
, which are used to form the indefinite, negative and universal pronouns and adverbs, are assigned the PoS tag PART and the feature qpm-PartType with one of the following valuesInd
,Neg
orTot
respectively.
-
Syntax
Only the morphological annotation of the treebank has been studied in detail so far. The syntactic annotation was obtained with the Udify tools. An updated version of the treebank with a fully studied syntactic annotation is estimated to be uploaded at the end of 2022.
Core Arguments, Oblique Arguments and Adjuncts
Other relations:
No used relations:—>
Treebanks
There is 1 Pomak UD treebank: