home edit page issue tracker

This page pertains to UD version 2.

UD Moksha JR

Language: Moksha (code: mdf)
Family: Uralic, Mordvin

This treebank has been part of Universal Dependencies since the UD v2.5 release.

The following people have contributed to making this treebank part of UD: Jack Rueter, Maria Levina, Nadezhda Kabaeva, Judit Molnár, Khalid Alnajjar.

Repository: UD_Moksha-JR
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-SA 4.0

Genre: nonfiction, news

Questions, comments? General annotation questions (either Moksha-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [rueter • jack (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

Erme Universal Dependencies annotated texts Moksha are the origin of UD_Moksha-JR with annotation (CoNLL-U) for texts in the Moksha language, it originally consists of a sample from a number of fiction authors writing originals in Moksha.

This is a collection of sentences from almost entirely original Moksha-language literary sources dating back to the 1880s with Universal Dependencies (UD) annotations. It has been constructed in alignment with parallel work on Erzya language Universal Dependencies.

There are also about 20 parallel sentences translated by Marina Levina from the Erzya and Russian texts: http://ilazki.thinkgeek.co.uk/brat/#/uralic/myv and http://ilazki.thinkgeek.co.uk/brat/#/uralic/rus

The sent_id attribute value is not randomized in works published earlier than 1938. Developing UD documentation can be found at https://github.com/UniversalDependencies/docs for Erzya.

https://github.com/rueter/erme-ud-moksha

Acknowledgments

The original annotation has been performed by Jack Rueter at the University of Helsinki with the help of Marina Levina at the Mordovian State University im. P.N. Ogariova, Mordvin Languages Department using morphological tools that were originally built with funding from a Kone Foundation «Language Programme» funded project: «Creation of Morphological Parsers for Minority Finno-Ugrian Languages» (2013–2014) with the linguistic work of Merja Salo, and facilitated at the Norwegian Arctic University in Tromsø. Work with the Moksha treebank builds upon previous experience with the UD_Erzya-JR treebank and continued consultations and discussions with Francis Tyers, Tommi Pirinen, Jonathan Washington. Without the Moksha writers themselves, however, we would be no where…

Annotation work is simultaneous to finite-state transducer development by Nadjezhda Kabaeva, Marina Levina and Jack Rueter in the GiellaLT infrastucture, which also works with Constraint Grammar disambiguation of the morphological analysis.

References

If you use this data set in an academic publication, I would be ever so grateful if you cited it as follows:

Jack Rueter. (2018, January 20). Erme UD Moksha (Version v1.0) http://doi.org/10.5281/zenodo.1156112

DOI

About the authors

In release 2.7 additional example sentences used in the Moksha-language grammar Мокшень кяль, синтаксис: учебник (2008) were included. These sentences are marked with sent_id-s that contain the components MKS:2008:page:n-th sentence:original author. It is hoped that the inclusion of these sentences will help cover various grammatical phenomena in Moksha syntax. When refering to these sentences, we advise you also cite the original source:

Statistics of UD Moksha JR

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AbbrAdpTypeAdvTypeAnimacyAspectCaseCliticConnegativeDefiniteDegreeDerivationGenderMoodNameTypeNounTypeNumberNumber[obj]Number[psor]Number[subj]NumFormNumTypePartFormPersonPerson[obj]Person[psor]Person[subj]PolarityPronTypePunctSideReflexStyleTenseTypoVariantVerbFormVerbType

Relations

aclacl:relcladvcladvcl:cauadvcl:evaladvcl:tcladvmodadvmod:cauadvmod:cmpadvmod:degadvmod:evaladvmod:focadvmod:freqadvmod:lmodadvmod:mmodadvmod:tmodamodapposauxaux:cndaux:necaux:negaux:optaux:qcasecccc:preconjccompcompoundconjcopcsubjcsubj:copdepdetdiscoursedislocatedexplfixedflatflat:namelistmarknmodnmod:apposnmod:bahuvnmod:gobjnmod:lmodnmod:possnmod:tmodnsubjnsubj:copnsubj:passnummodobjoblobl:agentobl:cauobl:cmpobl:freqobl:instobl:lmodobl:tmodorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview