home edit page issue tracker

This page pertains to UD version 2.

UD Komi Zyrian IKDP

Language: Komi Zyrian (code: kpv)
Family: Uralic, Permic

This treebank has been part of Universal Dependencies since the UD v2.2 release.

The following people have contributed to making this treebank part of UD: Niko Partanen, Rogier Blokland, Michael Rießler.

Repository: UD_Komi_Zyrian-IKDP
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either Komi Zyrian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [nikotapiopartanen (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas assigned by a program, with some manual corrections, but not a full manual verification
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS assigned by a program, with some manual corrections, but not a full manual verification
Features assigned by a program, with some manual corrections, but not a full manual verification
Relations annotated manually, natively in UD style


This treebank consists of dialectal transcriptions of spoken Komi-Zyrian. The current texts are short recorded segments from different areas where the Iźva dialect of Komi language is spoken.

The materials have been collected within the Iźva Komi Documentation Project, funded by Kone Foundation in 2014-2016, and archived in The Language Archive. The transcriptions have been done by native speakers, and the orthographic transcription system, although matching the Komi orthography where applicable, is primarily phonemic. The data in this treebank represents only the northern Iźva dialect of Komi, but materials from other dialects will also be included in the future. The sent_id values match those in archived the IKDP corpus, and the + character is used to mark sentence IDs that span across multiple annotations.

The corpus contains portions of recordings made between 1959 and 2016. The parts that have been published earlier by Erik Vászolyi in the Specimina Sibirica series are reproduced here with written permission.

The IKDP corpus uses the treebank as one of its annotation schemes. During the end of 2018, the entire audio-visual language documentation corpus will be transferred from TLA into a new repository. In this process, the actual linking of the treebank to the multimedia files will be revisited and clear conventions for doing this will be developed and documented. This work will be completed by release 2.4.


The work was done as collaboration within the Kone Foundation-funded research project Language Documentation meets Language Technology: The Next Step in the Description of Komi and the LAKME project funded by a grant from Paris Sciences et Lettres (IDEX PSL reference ANR-10-IDEX-0001-02).

If you use this treebank in your work, please cite:

Sources used

Statistics of UD Komi Zyrian IKDP

POS Tags






Tokenization and Word Segmentation



Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features


Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview