This page pertains to UD version 2.

UD German GSD

Language: German (code: de)
Family: Indo-European, Germanic

This treebank has been part of Universal Dependencies since the UD v1.0 release.

The following people have contributed to making this treebank part of UD: Slav Petrov, Wolfgang Seeker, Ryan McDonald, Joakim Nivre, Daniel Zeman, Adriane Boyd.

License: CC BY-SA 4.0

Genre: news, reviews, wiki

Annotation Source
Lemmas assigned by a program, not checked manually
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS assigned by a program, not checked manually
Features assigned by a program, not checked manually
Relations annotated manually in non-UD style, automatically converted to UD


The German UD is converted from the content head version of the universal dependency treebank v2.0 (legacy).

The ACL 2013 paper (https://github.com/ryanmcd/uni-dep-tb/blob/master/ACL2013.pdf, McDonald et al.) describes version 1.0 of the corpus, of which there are 2200 train/800 dev/1000 test sentences in German. According to the paper they consist of Reviews and News genres (the news data being from the TIGER Treebank, Reviews presumably from Google).

The subsequent 2.0 release has more data: 14118 train/799 dev/977 test sentences. Some of the sentences in 1.0 turned out to be duplicated across splits, which was fixed for 2.0. There is no indication in the READMEs of where the new German sentences came from.

Based on the above and the mappings in not-to-release/ud-tiger-mapping.txt, it appears that the genres are:

train: Reviews=s1-s1500, News=s1501-s2200, Web=s2201-s14118 By searching for a selection of sentences in the s2201-s14118 range, i.e. the new ones in version 2.0, it looks like they are from Wikipedia and other websites. dev: Reviews=s1-s500, News=s501-s799 test: Reviews=s1-s301, News=s302-s977


