This page pertains to UD version 2.

UD German HDT

Language: German (code: de)
Family: Indo-European, Germanic

This treebank has been part of Universal Dependencies since the UD v2.4 release.

The following people have contributed to making this treebank part of UD: Emanuel Borges Völker, Felix Hennig, Arne Köhn, Maximilan Wendt.

License: CC BY-SA 4.0

Genre: news, nonfiction, web

Questions, comments? General annotation questions (either German-specific or cross-linguistic) can be raised in the main UD issue tracker.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS assigned by a program, with some manual corrections, but not a full manual verification
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion


UD German-HDT is a conversion of the Hamburg Dependency Treebank, created at the University of Hamburg through manual annotation in conjunction with a standard for morphologically and syntactically annotating sentences as well as a constraint-based parser.

The Hamburg Dependency Treebank consists of 261,821 sentences (4.8M tokens). The sentences were all sourced from the German news site heise.de, from articles published between 1996 and 2001. The content of the articles ranges from formulaic periodic updates on new BIOS revisions and processor models or quarterly earnings of tech companies over features about general trends in the hardware and software market to general coverage of social, legal and political issues in cyberspace, sometimes in the form of extensive weekly editorial comments. The creation of the treebank through manual annotation was largely interleaved with the creation of a standard for morphologically and syntactically annotating sentences as well as a constraint-based parser.

For UD_German-HDT, 206,794 sentences (3.8M tokens) from the original HDT were converted with TrUDucer, a treebank conversion tool created by Felix Hennig and extended by Maximilian Wendt and Emanuel Borges Völker. The conversion has a very high accuracy of 97% (checked on a manually converted subset of the treebank). Annotation information not captured in the original annotation was resolved by using external data sources (Wiktionary) and manual input from annotators.


The following people worked on the conversion:


If you use this treebank, please cite the following paper, describing the conversion of the HDT to UD:

Borges Völker, Emanuel and Wendt, Maximilian and Hennig, Felix and Köhn, Arne (2019). HDT-UD: A very large Universal Dependencies Treebank for German. In Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019) (pp. 46–57). Paris, France: Association for Computational Linguistics. url: https://www.aclweb.org/anthology/W19-8006

The TrUDucer paper describing the formalism behind the conversion:

Hennig, Felix, & Köhn, Arne (2017). Dependency tree transformation with tree transducers. In Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017) (pp. 58–66). Gothenburg, Sweden: Association for Computational Linguistics. url: http://www.aclweb.org/anthology/W17-0407

The paper describing the HDT:

Foth, K. A., Köhn, A., Beuck, N., & Menzel, W. (2014). Because Size Does Matter: The Hamburg Dependency Treebank. In Proceedings of the Language Resources and Evaluation Conference 2014 (pp. 2326–2333). Reykjavik, Iceland: European Language Resources Association (ELRA). url: http://nbn-resolving.de/urn:nbn:de:gbv:18-228-7-2013

The annotation guidelines of the original HDT:

Foth, K. A. (2006). Eine umfassende Constraint-Dependenz-Grammatik des Deutschen. url: http://nbn-resolving.de/urn:nbn:de:gbv:18-228-7-2048


TrUDucer the software used to convert the HDT. Comes with a pipeline to replicate the conversion of the HDT.

jwcdg, the successor of the parser used for initial automatic annotation of the HDT. It contains the lexicon with the relevant morpho-syntactic features annotated.

DECCA, a tool to detect and correct errors in annotated corpora

