UD Veps VWT
Language: Veps (code: vep
)
Family: Uralic
This treebank has been part of Universal Dependencies since the UD v2.13 release.
The following people have contributed to making this treebank part of UD: Käbi Laan.
Repository: UD_Veps-VWT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Veps-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [kaebi • laan (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
UD Veps-VWT is a manually annotated corpus of Veps made in Universal dependencies annotation scheme. The data is collected from VepKar corpora and consists of mostly modern news texts written in Central Veps dialect.
UD Veps-VWT is a manually annotated corpus of Veps made in Universal dependencies annotation scheme. The data is collected from VepKar corpora and consists of mostly modern news texts written in Central Veps dialect. The morphologigal annotations and grammar decisions are based on the language studies made by Riho Grünthal and different Veps dictionaries (by Nina Zaitseva). Many syntactic decisions are based on pre-existing Finnish, Estonian, Karelian and Russian treebanks.
Acknowledgments
This work has been developed as part of the master thesis written by Käbi Laan in the University of Tartu with the help of supervisors Kadri Muischnek and Eva Saar.
References
- Grünthal, Riho 2015. Vepsän kielioppi. Apunevoja suomalais-ugrilaisten kielten opintoja varten XVII. Helsinki: Suomalais-Ugrilainen Seura.
- Zaitseva = Зайцева, Н.Г., Е.Е. Харитонова, О.Ю. Жукова 2012. Орфографический словарь вепсского языка (Vepsän kelen orfografine vajehnik). Петрозаводск: Карельский научный центр.
Statistics of UD Veps VWT
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
AdpType – Case – Clitic – Connegative – Degree – Mood – Number – NumForm – NumType – Person – Polarity – PronType – Reflex – Tense – Typo – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – case – cc – ccomp – conj – cop – csubj – csubj:cop – det – flat – mark – nmod – nsubj – nsubj:cop – nummod – obj – obl – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 103 sentences and 1303 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 49 types of words that contain both letters and punctuation. Examples: Kalag', nügüd', kel't, el'geta, kul'tursebran, der'oun, el'genzin, jäl'ghe, midä-se, sid', ved', Kod'arven, Muštat-ik, Oli-ik, Pagištihe-ik, Päžar', Päžar'laižed, Sil-žo, Tal'vel, der'onas, der'ounadme, der'ounan, el'gendaižiba, el'genzi, elo-oza, kaks', kaks'keline, kel', kodikel't, kul'turad, kuna-se, kut-se, kül'mäiži, laps'aigaspäi, oiktuz'tedon, pert', pidab-ik, pit'kha, počt-ki, pämez', radod-ki, sel'ged, sur', sügüz'kud, toine_tošt, venän-ki, voin-ik, Äjak-se, üks'
Morphology
Tags
- This corpus uses 13 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: DET, INTJ, SYM, X
- This corpus contains 1 word types tagged as particles (PART): ved'
- This corpus contains 23 lemmas tagged as pronouns (PRON): hän, hö, iče, ičein, ičetoi, ičeze, kaik, kaik-se, ken, kudamb, mi, mi-se, minä, mitte, mugoine, mö, ne, nece, se, sinä, toine, toine_toise, tö
- This corpus contains 0 lemmas tagged as determiners (DET):
- This corpus contains 5 lemmas tagged as auxiliaries (AUX): ei, olda, pidada, sada, voida
- Out of the above, 3 lemmas occurred sometimes as AUX and sometimes as VERB: olda, pidada, sada
- There are 5 (de)verbal forms:
- Conv
- VERB: Išttes, Tuldes
- Fin
- AUX: om, oli, oma, pidab, ole, olend, voi, voiba, voigoi, olen
- VERB: eläba, radoin, ajoin, seižub, abutab, el'genzin, meletan, muštan, pätin, sain
- Inf
- VERB: tehta, eläda, el'geta, kaita, pagišta, panda, rata, vajehtada, vastatas, abutada
- Part
- VERB: sündnu, omištadud, peittud, tehtud
- Sup
- VERB: radmaha, elämaha, kacmaha, opendamhas, rata, valitihe
Nominal Features
- Plur
- ADJ: erazvuiččed, enččiden, erazvuiččiš, lämembad, melekahad, melentartuižid, sel'ged, tägälaižed, vepsläižid, vägevad
- AUX: ei, oma, voiba, voim, oliba
- AUX-Fin: oma, voiba, voim, oliba
- NOUN: lapsed, ristitud, jurid, adivoid, eläjiden, projektoid, ristituid, tedoid, vanhembad, vepsläižed
- PRON: meiden, mö, heiden, Tö, kudambad, hö, Ned, Niiš, Teid, Teiden
- VERB-Fin: eläba, tegem, Muštat-ik, Toivotam, abutiba, ajelkoiš, ajoim, el'gendaižiba, eliba, eläškandeb
- Sing
- ADJ: vepsän, hüvä, surel, čomal, jüged, kebn, tärged, äi, čoma, aktivižen
- AUX: om, ei, oli, en, pidab, olen, Olin, ole, pidab-ik, pidi
- AUX-Fin: om, oli, pidab, olen, Olin, ole, pidab-ik, pidi, voin-ik
- NOUN: kelel, külän, rad, elo, kel't, kanzan, kanzas, kul'tursebran, küläs, mal
- PRON: minä, ičeze, nece, minun, kaik, minei, hän, kaiken, se, ičein
- PROPN: Kalag', Natalja, Päžarvehe, Kaskez, Kaskezaspäi, Piterin, Silakova, Vepsän, Änižjärven, Himjogi
- VERB-Fin: radoin, ajoin, seižub, abutab, el'genzin, meletan, muštan, pätin, sain, tahtoin
- Abl
- NOUN: laps'aigaspäi, posadaspäi
- PROPN: Kaskezaspäi
- Ade
- ADJ: surel, čomal, armhal
- NOUN: kelel, mal, randal, aigal, Homendesel, Kezal, Tal'vel, avtobusal, der'ounadme, kodikelel
- NUM: 23.
- PRON: Teil, kudambal, necil
- PROPN: Venämal
- All
- ADJ: korktale, nügüdläižele
- NOUN: azjale, kodimale, lebupäivile, radsijale, ristituile
- PRON: minei, Teile, heile, hänele, kaikile, meile
- Com
- NOUN: vepsläižidenke, kaluidenke, kiviradnikoidenke, pertidenno, sündundpäivänke, vanhembidenke, vellenke
- PROPN: Silakovanke
- Ela
- NOUN: kanzaspäi, külišpäi, polespäi
- PRON: heišpäi
- PROPN: Kaskezaspäi, Tožegespäi
- Ess
- ADJ: aktivižen
- NOUN: omblijan, opendajan, paštajan, ühtnikan
- Gen
- ADJ: vepsän, enččen, enččiden, suren, toižen, vepsläižen, vägevan
- NOUN: külän, kanzan, kul'tursebran, aigan, elon, eläjiden, vepsläižiden, školan, Tatan, Valičusiden
- PRON: meiden, minun, kaiken, heiden, ičein, necen, sen, hänen, Teiden, ičeze
- PROPN: Piterin, Vepsän, Änižjärven, Karjalan, Kod'arven, Natalja, Päžarven, Vologdan, Änižen
- Ill
- ADJ: verhaze
- NOUN: školha, agjaha, eloho, kivikarjeroihe, kodihe, kodimaha, konkursoihe, külähä, küzundoihe, maha
- PRON: neche
- PROPN: Päžarvehe, Kaskezaha
- VERB-Sup: radmaha, elämaha, kacmaha, opendamhas, rata, valitihe
- Ine
- ADJ: erazvuiččiš, hüväs, vepsläižes
- NOUN: kanzas, küläs, elos, internatas, mirus, muzejas, posadas, südäimes, školas, agjas
- NUM: ühtes
- PRON: kaikes, neciš, Niiš, sinus, toižiš
- PROPN: Kalages, Kaskezas, ORDas, Päžarves
- Nom
- ADJ: hüvä, erazvuiččed, jüged, kebn, tärged, äi, čoma, bohat, kaks'keline, kulu
- NOUN: lapsed, ristitud, rad, elo, aig, der'oun, eläjad, pagin, praznik, rahvaz
- NUM: 40, 15, 2017, kahesa, kaks', koume, üks'
- PRON: minä, ičeze, nece, mö, kaik, hän, se, Tö, kudambad, hö
- PROPN: Kalag', Natalja, Kaskez, Silakova, Himjogi, Jevgenjevna, Kalarand, Päžar'
- Par
- ADJ: ezmäižid, hüväd, korktad, kovad, melentartuižid, rahvahališt, sijališt, sotovijad, ut, vepsläšt
- NOUN: kel't, jurid, rahvast, vot, adivoid, elod, projektoid, ristituid, tedoid, väged
- NUM: kaht
- PRON: midä-se, mindai, necidä, Mittušt, Teid, ičtaze, ked, meid, midä, sidä
- PROPN: Jevgenjevnad
- Ter
- NOUN: lophusai
- PROPN: Toižegehesai
- Tra
- NOUN: pämeheks, Ozutesikš
Degree and Polarity
- Cmp
- ADJ: lämembad
- Pos
- ADJ: hüvä, surel, čomal, erazvuiččed, jüged, kebn, tärged, äi, čoma, aktivižen
- Neg
- AUX: ei, en
Verbal Features
- Cnd
- AUX-Fin: Oliži
- VERB-Fin: el'gendaižiba, eläiži, kül'mäiži, muštaižiba, pagižižiba, tahtoižin, tehtas
- Ind
- AUX-Fin: om, oli, oma, pidab, ole, olend, voi, voiba, voigoi, olen
- VERB-Fin: eläba, radoin, ajoin, seižub, abutab, el'genzin, meletan, muštan, pätin, sain
- Past
- AUX-Fin: oli, olend, Olin, oliba, pidi, sand, voind
- VERB-Fin: radoin, ajoin, el'genzin, sain, tahtoin, tuli, Oli-ik, Pagištihe-ik, abutiba, ajoim
- VERB-Part: sündnu, omištadud, peittud, tehtud
- Pres
- AUX-Fin: om, oma, pidab, ole, voi, voiba, voigoi, olen, voim, Oliži
- VERB-Fin: eläba, seižub, abutab, meletan, muštan, tegem, Muštat-ik, Om, Toivotam, ajelese
- Act
- AUX-Fin: om, oli, oma, pidab, ole, olend, voi, voiba, voigoi, olen
- VERB-Fin: eläba, radoin, ajoin, seižub, abutab, el'genzin, meletan, muštan, pätin, sain
- VERB-Part: sündnu, peittud
- VERB-Sup: radmaha, elämaha, kacmaha, opendamhas, rata
- Pass
- VERB-Fin: Pagištihe-ik, nittas, pagištihe, pandas, pidätas
- VERB-Part: omištadud, tehtud
- VERB-Sup: valitihe
Pronouns, Determiners, Quantifiers
- Dem
- PRON: nece, se, necen, sen, neche, necidä, neciš, Ned, Niiš, necil
- Int
- PRON: kudambad, mitte, Ken, Mi, Mittušt, ked, kudambal, midä, mugoine
- Prs
- PRON: minä, ičeze, meiden, minun, mö, minei, hän, heiden, Tö, ičein
- Tot
- PRON: kaik, kaiken, kaikes, kaiked, kaikile, kaikse
- Card
- NUM: 40, 15, 2017, kahesa, kaht, kaks', koume, ühtes, üks'
- Ord
- ADJ: ezmäižid
- NUM: 23.
- Yes
- PRON: ičeze, ičein, ičetoi, ičtaze
- 1
- AUX: en, olen, voim, Olin, voin-ik
- AUX-Fin: olen, voim, Olin, voin-ik
- PRON: minä, meiden, minun, mö, minei, mindai, meid, meile
- VERB-Fin: radoin, ajoin, el'genzin, meletan, muštan, pätin, sain, tahtoin, tegem, Toivotam
- 2
- PRON: Tö, Teid, Teiden, Teil, Teile, sinus
- VERB-Fin: Muštat-ik, valičit, zavodit
- 3
- AUX: om, ei, oli, oma, pidab, voiba, oliba, pidab-ik, pidi
- AUX-Fin: om, oli, oma, pidab, voiba, oliba, pidab-ik, pidi
- PRON: hän, heiden, hänen, hö, heile, heišpäi, hänele
- VERB-Fin: eläba, seižub, abutab, tuli, Oli-ik, Om, abutiba, ajelese, andoi, el'gendaižiba
Other Features
- AdpType
- Post
- ADP: täht, polhe, taga, abul, edel, jäl'ghe, keskes, möto, päle
- Prep
- ADP: Kacmata, ümbri
- Post
- Clitic
- Ik
- AUX-Fin: pidab-ik, voin-ik
- VERB-Fin: Muštat-ik, Oli-ik, Pagištihe-ik
- Ki
- NOUN: venän-ki
- Se
- ADV: kuna-se, kut-se, Äjak-se
- PRON: midä-se, kaikse
- Ik
- Connegative
- Yes
- AUX-Fin: ole, olend, voi, voigoi, pida, sand, voind
- VERB-Fin: ajelkoiš, azotade, koskend, kül'mäiži, navedind, tekoi
- Yes
- NumForm
- Digit
- NUM: 40, 15, 2017, 23.
- Word
- ADJ: ezmäižid
- NUM: kahesa, kaht, kaks', koume, ühtes, üks'
- Digit
- Typo
- Yes
- VERB-Fin: terverhtoitaba
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: olda.
- This corpus uses 5 lemmas as auxiliaries (aux). Examples: ei, voida, pidada, olda, sada.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN-Nom (37)
- VERB-Fin--NOUN-Par (3)
- VERB-Fin--PRON-Nom (39)
- VERB-Fin--PRON-Par (1)
- VERB-Inf--NOUN-Nom (3)
- VERB-Inf--PRON-Nom (5)
- VERB-Part--NOUN-Nom (1)
- VERB-Part--PRON-Nom (2)
- obj
- VERB-Fin--NOUN-Gen (3)
- VERB-Fin--NOUN-Nom (1)
- VERB-Fin--NOUN-Par (21)
- VERB-Fin--PRON-Gen (1)
- VERB-Fin--PRON-Par (4)
- VERB-Inf--NOUN-Ade (1)
- VERB-Inf--NOUN-Nom (4)
- VERB-Inf--NOUN-Par (17)
- VERB-Inf--PRON-Gen (2)
- VERB-Inf--PRON-Par (4)
- VERB-Part--NOUN-Nom (2)
- VERB-Sup--NOUN-Par (1)
- VERB-Sup--PRON-Par (1)
Verbs with Reflexive Core Objects
- This corpus contains 1 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: löuta ičtaze