home edit page issue tracker

This page pertains to UD version 2.

UD Nenets Tundra

Language: Nenets (code: yrk)
Family: Uralic

This treebank has been part of Universal Dependencies since the UD v2.16 release.

The following people have contributed to making this treebank part of UD: Bruno Guillaume, Sylvain Kahane, Nikolett Mus, Daniel Zeman.

Repository: UD_Nenets-Tundra
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either Nenets-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [mus • nikolett (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

The Tundra Nenets UD treebank is converted from the Tundra Nenets mSUD treebank. The conversion from mSUD to UD is performed automatically followed by a comprehensive manual revision to ensure compliance with the UD annotation standards.

The treebank currently consists of 93 manually annotated sentences (5.6758783 seconds of recorded speech). The data originates from a fieldwork session conducted in Moscow in 2017 with a native speaker of Tundra Nenets, representing the Yamal dialect. The session involved semi-spontaneous speech elicitation using visual stimulus-based tasks, based on a modified version of the HCRC Map Task

The morphological and syntactic annotation of the original mSUD treebank was created manually. The conversion from mSUD to UD was designed and implemented by Bruno Guillaume.

The transcription of the spoken data was carried out by the speaker and follows the standard orthographic conventions of Tundra Nenets, rather than a phonetic or IPA-based system.

To further support the analysis of prosodic and discourse-related phenomena, the recordings were aligned phonetically using Praat, and relevant features of spoken language were incorporated into the annotation.

The original transcription in Cyrillic script was transliterated into Latin script, taking into account certain linguistic particularities of Tundra Nenets.

Acknowledgments

The development of this treebank was supported by two research projects: Autogramm: Induction of Descriptive Grammar from Annotated Corpora (ANR-21-CE38-0017), and ThEA: Theoretical and Experimental Approaches to Dialectal Variation and Contact-Induced Change – A Case Study of Tundra Nenets (NKFIH FK 129235). These projects contributed to both the data collection and the creation of the treebank.

References

Statistics of UD Nenets Tundra

POS Tags

ADJADPADVAUXDETINTJNOUNNUMPRONPUNCTVERBX

Features

NumberPersonPronTypeVerbForm

Relations

acladvcladvmodamodauxcaseccompcopcsubjdepdetdiscoursemarknmodnmod:possnsubjnsubj:outernummodobjobl:modparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation