home edit page issue tracker

This page pertains to UD version 2.

UD Yiddish YiTB

Language: Yiddish (code: yi)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.17 release.

The following people have contributed to making this treebank part of UD: Kirk Andrews.

Repository: UD_Yiddish-YiTB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17

License: CC BY-SA 4.0

Genre: grammar-examples, learner-essays, bible, wiki, fiction, nonfiction, spoken, web

Questions, comments? General annotation questions (either Yiddish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [m • kirkandrews (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas assigned by a program, not checked manually
UPOS annotated manually, natively in UD style
XPOS not available
Features not available
Relations annotated manually, natively in UD style

Description

YiTB is a treebank of linguistically annotated Yiddish data in the Universal Dependencies framework, created via a bootstraping machine learning method. A total of 27,872 tokens are currently in the treebank from a variety of sources and textual genres.

Yiddish is classified as a West Germanic language, although it includes many elements from Semitic and Slavic languages as well. It is written in a modified Hebrew alphabet. Yiddish is structurally similiar to German, but it also consists of many interesting structures not found in other Germanic languages, such as periphrastic verbs.

There are a total of 27,872 tokens in the treebank. Roughly 60% of these stem from the Tatoeba source and consist of short sentences provided by both native and non-native speakers of Yiddish. It must be noted that there are occasional grammatical errors in these sentences, such as the use of the auxiliary zayn ‘be’ instead of hobn ‘have’ in past tense constructions of periphrastic verbs formed with the verb zayn, as well as incorrect syntax of periphrastic verbs which have an underlying complement-head (OV) order and do not follow the typical order expected of an SVO language like Yiddish. This appears to be a common mistake of intermediate L2 Yiddish speakers. The remainder 40% of tokens stem from a variety of native speaker texts and genres. The various source texts and genres are shown below.

Lemmas and transliterations into Latin script are provided as well by self-made models but are not 100% accurate. The transliteration model, which can be accessed here, was trained on wiktionary and transliterated Bible data. The lemmatization model was trained on wiktionary data and can be found here. Translations are not provided at this time, but a model trained on Tatoeba sentences and parallel Bible verses is accessible here. Morphological features are also not included at this time.

Source Author Genre Added Split
tatoeba.org Various grammar/learner 2.17 all
Book of Exodus Yehoyesh translation bible 2.17 all
Beethoven’s Moonlight Sonata Shloyme Bas­tom­s­ki fiction 2.17 train
Yiddish proverbs Various proverb 2.17 all
Haggadahs and Elijah the Prophet Proste Yiddish web 2.17 test
Bulletin No. 3: At the Border Various nonfiction 2.17 test
A Story with a Cat and Yiddish Dialects Proste Yiddish web 2.17 dev
Sholem Aleichem Proste Yiddish web 2.17 train
Hirshke Glik Shmerke Kaczerginski nonfiction 2.17 dev
Book of Proverbs Yehoyesh translation bible 2.17 test
Shavuot and an Old Joke Proste Yiddish web 2.17 test
Bankrupt Katie Brown fiction 2.17 train
Jews and Yiddish Nokhem Shtif nonfiction 2.17 train
Fathers and Children Chaim Malitz nonfiction 2.17 train
Wikipedia Various nonfiction 2.17 train
A Foolish Child Jacob Dinezon fiction 2.17 test
From the Land of Consumption Shloyme Gilbert fiction 2.17 dev
The Four Questions Traditional liturgical 2.17 test
A Bit of Clarity and Simplicity Regarding the Language Question Hillel Zeitlin nonfiction 2.17 train
Song of Songs Yehoyesh translation bible 2.17 train

Acknowledgments

To the best of our knowledge, the source texts used for the creation of this treebank are either in the public domain or are an orphan work for which no copyright holder can be found. If you hold the copyright to any of the texts used in this treebank and would like their removal, please contact us at the email below.

Statistics of UD Yiddish YiTB

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

ExtPosTypo

Relations

aclacl:relcladvcladvcl:relcladvmodamodapposauxaux:passcaseccccompcompoundcompound:lvccompound:prtcompound:redupconjcopcsubjdepdetdet:possdiscoursedislocatedexplexpl:pvfixedflatflat:foreignflat:namegoeswithiobjmarknmodnmod:possnsubjnsubj:outernsubj:passnummodobjoblobl:agentobl:argorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Relations Overview