home edit page issue tracker

This page pertains to UD version 2.

UD Skolt Sami Giellagas

Language: Skolt Sami (code: sms)
Family: Uralic, Sami

This treebank has been part of Universal Dependencies since the UD v2.5 release.

The following people have contributed to making this treebank part of UD: Jack Rueter, Markus Juutinen, Francis Tyers, Tommi A Pirinen, Mika Hämäläinen.

Repository: UD_Skolt_Sami-Giellagas
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-SA 4.0

Genre: nonfiction, news, spoken

Questions, comments? General annotation questions (either Skolt Sami-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [rueter • jack (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD


The UD Skolt Sami Giellagas treebank is based almost entirely on spoken Skolt Sami corpora.

UD Skolt Sami is the original annotation (CoNLL-U) for texts in the Skolt Sami language. It originally consists of twenty translated sentences http://ilazki.thinkgeek.co.uk/brat/#/uralic/sms made by Hilkka Fofonoff from the Finnish texts: here with UD 1. dependencies. Subsequent sentences come from the Giellagas Corpus of Spoken Saami Languages of the University of Oulu, Finland, which, in part, include research materials transferred from (Kotimaisten kielten keskus) «Kotus» ‘Institute for the Languages of Finland’.

Treebank sentences marked with text id beginning in [kotus-skak2010] originate from the publication Sääʹmǩiõll, äʹrbbǩiõll, for which the publisher ‘Institute for the Languages of Finland’ (Kotimaisten kielten keskus) has granted written permission to include in the treebank. Citation of the original publication should be included when the treebank is used (see References section below).



The original annotations have been performed by Jack Rueter at the University of Helsinki and Markus Juutinen at the Giellagas Institute (University of Oulu, Finland) using morphological tools developed with funding from a Kone Foundation «Language Programme» funded project: «Skolt Sami Revitalization through Intelligent Computer-assisted Language Learning means and the development of guidelines for transfering these methods to other threatened languages» (2015–2018) with the linguistic consultation of Merja Fofonoff and Eino Koponen. The tools used have been facilitated through the open-source Giella infrastructure at the Norwegian Arctic University in Tromsø.

Work with the Skolt Sami treebank builds upon previous experience with the UD_Erzya-JR treebank as well as growing discussions with Francis Tyers, Tommi Pirinen, Jonathan Washington, Mika Hämäläinen and Niko Partanen. Without the Skolt Sami speakers and writers themselves, however, we would be no where…


Statistics of UD Skolt Sami Giellagas

POS Tags






Tokenization and Word Segmentation



Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features


Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview