home edit page issue tracker

This page pertains to UD version 2.

UD Old East Slavic Birchbark

Language: Old East Slavic (code: orv)
Family: Indo-European, Slavic

This treebank has been part of Universal Dependencies since the UD v2.10 release.

The following people have contributed to making this treebank part of UD: Olga Lyashevskaya.

Repository: UD_Old_East_Slavic-Birchbark
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.14

License: CC BY-SA 4.0

Genre: nonfiction

Questions, comments? General annotation questions (either Old East Slavic-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [olesar (æt) yandex • ru]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS not available
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually, natively in UD style

Description

UD Old_East_Slavic-Birchbark is based on the RNC Corpus of Birchbark Letters and includes documents written in 1025-1500 in an East Slavic vernacular (letters, household and business records, records for church services, spell against diseases, and other short inscriptions). The treebank is manually syntactically annotated in the UD 2.0 scheme, morphological and lexical annotation is a conversion of the original RNC annotation.

The treebank is based on the historical Corpus of Birchbark Letters written in 1025-1500. It is part of the Russian National Corpus and includes documents written in an East Slavic vernacular (letters, household and business records, records for church services, spell against diseases, and other short inscriptions). The birchbark letters were found during archaeological excavations in Novgorod the Great, Staraya Russa, and other East Slavic cites and provide rare evidence of everyday writing in pre-Mongol East Slavic dialects. The fact that many documents are damaged and/or fragmented, together with the diversity of the spelling traditions presented in them, makes them an interesting case for linguistic analysis and historical NLP research.

The digital copies of the letters are reproduced online in a database available at http://gramoty.ru, where the hypotheses on tokenisation are also provided. The morphological analysis was originally done under the Russian National Corpus scheme for the Corpus of Birchbark Letters, which is mostly compatible with the RNC Old Russian scheme. Lemmatization reflects the later Old Russian spelling tradition. The morphological and lexical annotation of the UD Old_East_Slavic-Birchbark treebank is a conversion of this annotation that follows the mapping in (Lyashevskaya 2019). The sentence segmentation and the dependency annotation is originally done in the UD 2.0 scheme.

Acknowledgments

Tokenisation, lexical and morphological analysis is primarily based on the study Old Novgorod Dialect by Andrei Zalizniak as well as other studies by Valentin Yanin, Andrei Zalizniak, Alexey Gippius, Dmitri Sitchinava, and other researchers of East Slavic vernacular. We thank the developers and annotators of the RNC Corpus of Birchbark Letters.

References

Statistics of UD Old East Slavic Birchbark

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AnalytAnimacyCaseCliticDegreeGenderMoodNameTypeNumberNumFormNumTypePersonPossPronTypeReflexTenseTypoVariantVerbFormVoice

Relations

aclacl:relcladvcladvmodamodapposauxaux:passcaseccccompcompoundconjcopcsubjdepdetdislocatedexplfixedflatflat:namegoeswithiobjlistmarknmodnsubjnsubj:passnummodnummod:govobjoblobl:agentorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview