This page pertains to UD version 2.

UD Bulgarian BTB

Language: Bulgarian (code: bg)
Family: Indo-European, Slavic

This treebank has been part of Universal Dependencies since the UD v1.1 release.

The following people have contributed to making this treebank part of UD: Kiril Simov, Petya Osenova, Martin Popel.

Repository: UD_Bulgarian-BTB
License: CC BY-NC-SA 3.0

Genre: news, legal, fiction

Questions, comments? General annotation questions (either Bulgarian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD


UD_Bulgarian-BTB is based on the HPSG-based BulTreeBank, created at the Institute of Information and Communication Technologies, Bulgarian Academy of Sciences. The original consists of 215,000 tokens (over 15,000 sentences).

All the texts were processed automatically at tokenization, morphological and chunk level. Then, the full syntactic analysis were perfomed manually by trained annotators.

The UD_Bulgarian-BTB consists of 156 149 tokens (11,138 sentences). This subset of BulTreeBank excludes ellipses and some rare phenomena. The conversion was done semi-automatically by Kiril Simov, with the application of set of rules and constraints for result consistency.

The rest of the sentences will be converted for the next releases. The original version is freely available for research upon request.


The original treebank was developed in a project (2001-2004), funded by the Volkswagen Stiftung, Federal Republic of Germany under the Programme “Cooperation with Natural and Engineering Scientists in Central and Eastern Europe”. The project was carried out mainly at IICT-BAS in tight cooperation with researchers at the Seminar für Sprachwissenschaft (SfS), Eberhard-Karls-Universität, Tübingen, Germany. Link: http://bultreebank.org/ The conversion of BulTreeBank into Universal Dependency format was supported by the EU Project QTLeap. Link: http://qtleap.eu/

We would like to thank all our colleagues that contributed to the annotation of the original treebank: Elisaveta Balabanova, Dimitar Dojkov, Maggie Ivanchukova, Sia Kolkovska, Milena Slavcheva, Petya Osenova. We also would like to thank our annotator and validator to the treebank UD version: Stanislava Kancheva.

Statistics of UD Bulgarian BTB

