UD Veps VWT
Language: Veps (code: vep)
Family: Uralic
This treebank has been part of Universal Dependencies since the UD v2.13 release.
The following people have contributed to making this treebank part of UD: Käbi Laan.
Repository: UD_Veps-VWT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Veps-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [kaebi • laan (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
UD Veps-VWT is a manually annotated corpus of Veps made using the Universal dependencies annotation scheme. The data is collected from VepKar corpora and consists of mostly modern news texts written in Central Veps dialect.
UD Veps-VWT is a manually annotated corpus of Veps made in Universal dependencies annotation scheme. The data is collected from VepKar corpora and consists of mostly modern news texts written in Central Veps dialect. The morphological annotations and grammatical decisions are based on the language studies made by Riho Grünthal and different Veps dictionaries (by Nina Zaitseva). Many syntactic decisions are based on pre-existing Finnish, Estonian, Karelian and Russian treebanks.
Acknowledgments
This work has been developed as part of the master thesis written by Käbi Laan in the University of Tartu with the help of supervisors Kadri Muischnek and Eva Saar.
References
- Grünthal, Riho 2015. Vepsän kielioppi. Apunevoja suomalais-ugrilaisten kielten opintoja varten XVII. Helsinki: Suomalais-Ugrilainen Seura.
- Zaitseva = Зайцева, Н.Г., Е.Е. Харитонова, О.Ю. Жукова 2012. Орфографический словарь вепсского языка (Vepsän kelen orfografine vajehnik). Петрозаводск: Карельский научный центр.
Statistics of UD Veps VWT
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
AdpType – Case – Clitic – Connegative – Degree – Mood – Number – NumForm – NumType – Person – Polarity – PronType – Reflex – Tense – Typo – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – case – cc – ccomp – conj – cop – csubj – csubj:cop – det – flat – mark – nmod – nsubj – nsubj:cop – nummod – obj – obl – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 103 sentences and 1303 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 15 types of words that contain both letters and punctuation. Examples: midä-se, Muštat-ik, Oli-ik, Pagištihe-ik, Sil-žo, elo-oza, kuna-se, kut-se, pidab-ik, počt-ki, radod-ki, toine_tošt, venän-ki, voin-ik, Äjak-se
Morphology
Tags
- This corpus uses 13 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: DET, INTJ, SYM, X
- This corpus contains 1 word types tagged as particles (PART): vedʼ
- This corpus contains 17 lemmas tagged as pronouns (PRON): hän, iče, kaik, kaik-se, ken, kudamb, mi, mi-se, minä, mitte, mugoine, ne, nece, se, sinä, toine, toine_toise
- This corpus contains 0 lemmas tagged as determiners (DET):
- This corpus contains 5 lemmas tagged as auxiliaries (AUX): ei, olda, pidada, sada, voida
- Out of the above, 3 lemmas occurred sometimes as AUX and sometimes as VERB: olda, pidada, sada
- There are 5 (de)verbal forms:
- Conv
- VERB: Išttes, Tuldes
- Fin
- AUX: om, oli, oma, pidab, ole, olend, voi, voiba, voigoi, olen
- VERB: eläba, radoin, ajoin, seižub, abutab, elʼgenzin, meletan, muštan, pätin, sain
- Inf
- VERB: tehta, eläda, elʼgeta, kaita, pagišta, panda, rata, vajehtada, vastatas, abutada
- Part
- VERB: sündnu, omištadud, peittud, tehtud
- Sup
- VERB: radmaha, elämaha, kacmaha, opendamhas, rata, valitihe
Nominal Features
- Plur
- ADJ: erazvuiččed, enččiden, erazvuiččiš, lämembad, melekahad, melentartuižid, selʼged, tägälaižed, vepsläižid, vägevad
- AUX: ei, oma, voiba, voim, oliba
- AUX-Fin: oma, voiba, voim, oliba
- NOUN: lapsed, ristitud, jurid, adivoid, eläjiden, projektoid, ristituid, tedoid, vanhembad, vepsläižed
- PRON: meiden, mö, heiden, ičeze, Tö, kudambad, hö, ičetoi, Ned, Niiš
- VERB-Fin: eläba, tegem, Muštat-ik, Toivotam, abutiba, ajelkoiš, ajoim, eliba, eläškandeb, elʼgendaižiba
- Sing
- ADJ: vepsän, hüvä, surel, čomal, jüged, kebn, tärged, äi, čoma, aktivižen
- AUX: om, ei, oli, en, pidab, olen, Olin, ole, pidab-ik, pidi
- AUX-Fin: om, oli, pidab, olen, Olin, ole, pidab-ik, pidi, voin-ik
- NOUN: kelel, külän, rad, elo, kelʼt, kanzan, kanzas, kulʼtursebran, küläs, mal
- PRON: minä, nece, ičeze, minun, kaik, minei, hän, kaiken, se, ičein
- PROPN: Kalagʼ, Natalja, Päžarvehe, Kaskez, Kaskezaspäi, Piterin, Silakova, Vepsän, Änižjärven, Himjogi
- VERB-Fin: radoin, ajoin, seižub, abutab, elʼgenzin, meletan, muštan, pätin, sain, tahtoin
- Abl
- NOUN: lapsʼaigaspäi, posadaspäi
- PROPN: Kaskezaspäi
- Ade
- ADJ: surel, čomal, armhal
- NOUN: kelel, mal, randal, aigal, Homendesel, Kezal, Talʼvel, avtobusal, derʼounadme, kodikelel
- NUM: 23.
- PRON: Teil, kudambal, necil
- PROPN: Venämal
- All
- ADJ: korktale, nügüdläižele
- NOUN: azjale, kodimale, lebupäivile, radsijale, ristituile
- PRON: minei, Teile, heile, hänele, kaikile, meile
- Com
- NOUN: vepsläižidenke, kaluidenke, kiviradnikoidenke, pertidenno, sündundpäivänke, vanhembidenke, vellenke
- PROPN: Silakovanke
- Ela
- NOUN: kanzaspäi, külišpäi, polespäi
- PRON: heišpäi
- PROPN: Kaskezaspäi, Tožegespäi
- Ess
- ADJ: aktivižen
- NOUN: omblijan, opendajan, paštajan, ühtnikan
- Gen
- ADJ: vepsän, enččen, enččiden, suren, toižen, vepsläižen, vägevan
- NOUN: külän, kanzan, kulʼtursebran, aigan, elon, eläjiden, vepsläižiden, školan, Tatan, Valičusiden
- PRON: ičeze, meiden, minun, kaiken, heiden, ičein, necen, sen, hänen, ičetoi
- PROPN: Piterin, Vepsän, Änižjärven, Karjalan, Kodʼarven, Natalja, Päžarven, Vologdan, Änižen
- Ill
- ADJ: verhaze
- NOUN: školha, agjaha, eloho, kivikarjeroihe, kodihe, kodimaha, konkursoihe, külähä, küzundoihe, maha
- PRON: neche
- PROPN: Päžarvehe, Kaskezaha
- VERB-Sup: radmaha, elämaha, kacmaha, opendamhas, rata, valitihe
- Ine
- ADJ: erazvuiččiš, hüväs, vepsläižes
- NOUN: kanzas, küläs, elos, internatas, mirus, muzejas, posadas, südäimes, školas, agjas
- NUM: ühtes
- PRON: kaikes, neciš, Niiš, sinus, toižiš
- PROPN: Kalages, Kaskezas, ORDas, Päžarves
- Nom
- ADJ: hüvä, erazvuiččed, jüged, kebn, tärged, äi, čoma, bohat, kaksʼkeline, kulu
- NOUN: lapsed, ristitud, rad, elo, aig, derʼoun, eläjad, pagin, praznik, rahvaz
- NUM: 40, 15, 2017, kahesa, kaksʼ, koume, üksʼ
- PRON: minä, nece, mö, kaik, hän, se, Tö, kudambad, hö, mitte
- PROPN: Kalagʼ, Natalja, Kaskez, Silakova, Himjogi, Jevgenjevna, Kalarand, Päžarʼ
- Par
- ADJ: ezmäižid, hüväd, korktad, kovad, melentartuižid, rahvahališt, sijališt, sotovijad, ut, vepsläšt
- NOUN: kelʼt, jurid, rahvast, vot, adivoid, elod, projektoid, ristituid, tedoid, väged
- NUM: kaht
- PRON: midä-se, mindai, necidä, Mittušt, Teid, ičtaze, ked, meid, midä, sidä
- PROPN: Jevgenjevnad
- Ter
- NOUN: lophusai
- PROPN: Toižegehesai
- Tra
- NOUN: pämeheks, Ozutesikš
Degree and Polarity
- Cmp
- ADJ: lämembad
- Pos
- ADJ: hüvä, surel, čomal, erazvuiččed, jüged, kebn, tärged, äi, čoma, aktivižen
- Neg
- AUX: ei, en
Verbal Features
- Cnd
- AUX-Fin: Oliži
- VERB-Fin: eläiži, elʼgendaižiba, külʼmäiži, muštaižiba, pagižižiba, tahtoižin, tehtas
- Ind
- AUX-Fin: om, oli, oma, pidab, ole, olend, voi, voiba, voigoi, olen
- VERB-Fin: eläba, radoin, ajoin, seižub, abutab, elʼgenzin, meletan, muštan, pätin, sain
- Past
- AUX-Fin: oli, olend, Olin, oliba, pidi, sand, voind
- VERB-Fin: radoin, ajoin, elʼgenzin, sain, tahtoin, tuli, Oli-ik, Pagištihe-ik, abutiba, ajoim
- VERB-Part: sündnu, omištadud, peittud, tehtud
- Pres
- AUX-Fin: om, oma, pidab, ole, voi, voiba, voigoi, olen, voim, Oliži
- VERB-Fin: eläba, seižub, abutab, meletan, muštan, tegem, Muštat-ik, Om, Toivotam, ajelese
- Act
- AUX-Fin: om, oli, oma, pidab, ole, olend, voi, voiba, voigoi, olen
- VERB-Fin: eläba, radoin, ajoin, seižub, abutab, elʼgenzin, meletan, muštan, pätin, sain
- VERB-Part: sündnu, peittud
- VERB-Sup: radmaha, elämaha, kacmaha, opendamhas, rata
- Pass
- VERB-Fin: Pagištihe-ik, nittas, pagištihe, pandas, pidätas
- VERB-Part: omištadud, tehtud
- VERB-Sup: valitihe
Pronouns, Determiners, Quantifiers
- Dem
- PRON: nece, se, necen, sen, neche, necidä, neciš, Ned, Niiš, necil
- Ind
- PRON: midä-se
- Int
- PRON: kudambad, mitte, Ken, Mi, Mittušt, ked, kudambal, midä, mugoine
- Prs
- PRON: minä, ičeze, meiden, minun, mö, minei, hän, heiden, Tö, ičein
- Tot
- PRON: kaik, kaiken, kaikes, kaiked, kaikile, kaikse
- Card
- NUM: 40, 15, 2017, kahesa, kaht, kaksʼ, koume, ühtes, üksʼ
- Ord
- ADJ: ezmäižid
- NUM: 23.
- Yes
- PRON: ičeze, ičein, ičetoi, ičtaze
- 1
- AUX: en, olen, voim, Olin, voin-ik
- AUX-Fin: olen, voim, Olin, voin-ik
- PRON: minä, meiden, minun, mö, minei, ičein, mindai, meid, meile
- VERB-Fin: radoin, ajoin, elʼgenzin, meletan, muštan, pätin, sain, tahtoin, tegem, Toivotam
- 2
- PRON: Tö, ičetoi, Teid, Teiden, Teil, Teile, sinus
- VERB-Fin: Muštat-ik, valičit, zavodit
- 3
- AUX: om, ei, oli, oma, pidab, voiba, oliba, pidab-ik, pidi
- AUX-Fin: om, oli, oma, pidab, voiba, oliba, pidab-ik, pidi
- PRON: ičeze, hän, heiden, hänen, hö, heile, heišpäi, hänele, ičtaze
- VERB-Fin: eläba, seižub, abutab, tuli, Oli-ik, Om, abutiba, ajelese, andoi, eliba
Other Features
- AdpType
- Post
- ADP: täht, polhe, taga, abul, edel, jälʼghe, keskes, möto, päle
- Prep
- ADP: Kacmata, ümbri
- Post
- Clitic
- Ik
- AUX-Fin: pidab-ik, voin-ik
- VERB-Fin: Muštat-ik, Oli-ik, Pagištihe-ik
- Ki
- NOUN: venän-ki
- Se
- ADV: kuna-se, kut-se, Äjak-se
- PRON: midä-se, kaikse
- Ik
- Connegative
- Yes
- AUX-Fin: ole, olend, voi, voigoi, pida, sand, voind
- VERB-Fin: ajelkoiš, azotade, koskend, külʼmäiži, navedind, tekoi
- Yes
- NumForm
- Digit
- NUM: 40, 15, 2017, 23.
- Word
- ADJ: ezmäižid
- NUM: kahesa, kaht, kaksʼ, koume, ühtes, üksʼ
- Digit
- Typo
- Yes
- VERB-Fin: terverhtoitaba
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: olda.
- This corpus uses 5 lemmas as auxiliaries (aux). Examples: ei, voida, pidada, olda, sada.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN-Nom (37)
- VERB-Fin--NOUN-Par (3)
- VERB-Fin--PRON-Nom (39)
- VERB-Fin--PRON-Par (1)
- VERB-Inf--NOUN-Nom (3)
- VERB-Inf--PRON-Nom (5)
- VERB-Part--NOUN-Nom (1)
- VERB-Part--PRON-Nom (2)
- obj
- VERB-Fin--NOUN-Gen (3)
- VERB-Fin--NOUN-Nom (1)
- VERB-Fin--NOUN-Par (21)
- VERB-Fin--PRON-Gen (1)
- VERB-Fin--PRON-Par (4)
- VERB-Inf--NOUN-Ade (1)
- VERB-Inf--NOUN-Nom (4)
- VERB-Inf--NOUN-Par (17)
- VERB-Inf--PRON-Gen (2)
- VERB-Inf--PRON-Par (4)
- VERB-Part--NOUN-Nom (2)
- VERB-Sup--NOUN-Par (1)
- VERB-Sup--PRON-Par (1)
Verbs with Reflexive Core Objects
- This corpus contains 1 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: löuta ičtaze