home edit page issue tracker

This page pertains to UD version 2.

UD German GSD

Language: German (code: de)
Family: Indo-European, Germanic

This treebank has been part of Universal Dependencies since the UD v1.0 release.

The following people have contributed to making this treebank part of UD: Slav Petrov, Wolfgang Seeker, Ryan McDonald, Joakim Nivre, Daniel Zeman, Adriane Boyd.

Repository: UD_German-GSD
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.9

License: CC BY-SA 4.0

Genre: news, reviews, wiki

Questions, comments? General annotation questions (either German-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [zeman (æt) ufal • mff • cuni • cz]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas assigned by a program, not checked manually
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS assigned by a program, not checked manually
Features assigned by a program, not checked manually
Relations annotated manually in non-UD style, automatically converted to UD

Description

The German UD is converted from the content head version of the universal dependency treebank v2.0 (legacy).

The ACL 2013 paper (https://github.com/ryanmcd/uni-dep-tb/blob/master/ACL2013.pdf, McDonald et al.) describes version 1.0 of the corpus, of which there are 2200 train/800 dev/1000 test sentences in German. According to the paper they consist of Reviews and News genres (the news data being from the TIGER Treebank, Reviews presumably from Google).

The subsequent 2.0 release has more data: 14118 train/799 dev/977 test sentences. Some of the sentences in 1.0 turned out to be duplicated across splits, which was fixed for 2.0. There is no indication in the READMEs of where the new German sentences came from.

Based on the above and the mappings in not-to-release/ud-tiger-mapping.txt, it appears that the genres are:

train: Reviews=s1-s1500, News=s1501-s2200, Web=s2201-s14118 By searching for a selection of sentences in the s2201-s14118 range, i.e. the new ones in version 2.0, it looks like they are from Wikipedia and other websites. dev: Reviews=s1-s500, News=s501-s799 test: Reviews=s1-s301, News=s302-s977

Acknowledgments

Statistics of UD German GSD

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrCaseDefiniteForeignGenderGender[psor]MoodNumberNumber[psor]NumTypePersonPolarityPossPronTypeReflexTenseTypoVerbFormVoice

Relations

aclacl:relcladvcladvmodamodapposauxaux:passcaseccccompcompoundcompound:prtconjcopcsubjcsubj:passdepdetdet:possdiscourseexplexpl:pvfixedflatgoeswithiobjmarknmodnmod:possnsubjnsubj:passnummodobjoblobl:agentobl:argorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Verbs with Reflexive Core Objects

Relations Overview