home edit page issue tracker

This page pertains to UD version 2.

UD Turkish TueCL

Language: Turkish (code: tr)
Family: Turkic

This treebank has been part of Universal Dependencies since the UD v2.16 release.

The following people have contributed to making this treebank part of UD: Furkan Akkurt, Çağrı Çöltekin.

Repository: UD_Turkish-TueCL
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17

License: CC BY-SA 4.0

Genre: grammar-examples

Questions, comments? General annotation questions (either Turkish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [furkanakkurt7242 (æt) icloud • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

The Turkish-TueCL treebank is part of a parallel Universal Dependencies corpus containing 148 sentences across four Turkic languages (Turkish, Azerbaijani, Kyrgyz, and Uzbek), designed to facilitate cross-linguistic research on these related languages.

The Turkish-TueCL treebank consists of 148 carefully selected sentences (904 tokens) compiled from multiple sources, including the Cairo corpus (20 sentences), the UDTW23 corpus (20 sentences), and 97 additional examples illustrating specific grammatical constructions of interest. It serves as the source treebank for a parallel corpus spanning four Turkic languages from distinct branches of the family: Turkish and Azerbaijani (Oghuz), Kyrgyz (Kipchak), and Uzbek (Karluk).

The treebank includes various syntactic phenomena relevant to Turkic languages, such as pro-drop constructions, auxiliary chains, postverbal structures, and non-canonical word orders. Each sentence has been manually annotated following UD guidelines, with particular attention to morphosyntactic features that highlight both shared typological characteristics and language-specific traits. English translations are provided as metadata to support comparative research.

This resource is significant as it represents the first fully aligned parallel UD treebanks for these Turkic languages, enabling systematic cross-linguistic comparisons previously hindered by the lack of parallel resources. The treebank supports research in comparative Turkic syntax, cross-lingual parsing, and language education.

References

Please, cite the following paper if you use Turkish-TueCL UD treebank:

@inproceedings{akhundjanova-etal-2025-parallel,
title = "Parallel {U}niversal {D}ependencies Treebanks for {T}urkic Languages",
author = "Akhundjanova, Arofat and
Akkurt, Furkan and
Chontaeva, Bermet and
Eslami, Soudabeh and
Coltekin, Cagri",
editor = {Bouma, Gosse and
{\c{C}}{\"o}ltekin, {\c{C}}a{\u{g}}r{\i}},
booktitle = "Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)",
month = aug,
year = "2025",
address = "Ljubljana, Slovenia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.udw-1.14/",
pages = "129--136",
ISBN = "979-8-89176-292-3",
abstract = "We introduce the first fully aligned and manually annotated parallel Universal Dependencies (UD) treebanks for four Turkic languages: Azerbaijani, Kyrgyz, Turkish, and Uzbek. These resources currently consist of 148 strategically selected sentences that illustrate typologically significant morphosyntactic phenomena across these related yet distinct languages. These parallel treebanks enable systematic comparative studies of Turkic syntax and may be instrumental in cross-lingual NLP applications. All treebanks are available as part of UD v2.16."
}

Acknowledgments

This work was supported by COST Action CA21167 - Universality, diversity and idiosyncrasy in language technology (UniDive). We thank the Turkic UD working group for fruitful discussions of linguistic issues and annotation approaches.

Statistics of UD Turkish TueCL

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPRONPROPNPUNCTSCONJVERB

Features

AspectCaseDefiniteEvidentMoodNumberNumber[psor]NumTypePersonPerson[psor]PolarityPronTypeReflexTenseVerbFormVoice

Relations

acladvcladvmodadvmod:emphamodauxaux:qcaseccccompcompoundcompound:lvccompound:redupconjcopcsubjdetdiscoursefixedflatmarknmodnmod:possnsubjnsubj:outernsubj:passnummodobjoblobl:agentobl:tmodorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview