home edit page issue tracker

This page pertains to UD version 2.

UD Turkish BOUN

Language: Turkish (code: tr)
Family: Turkic, Southwestern

This treebank has been part of Universal Dependencies since the UD v2.7 release.

The following people have contributed to making this treebank part of UD: Büşra Marşan, Salih Furkan Akkurt, Utku Türk, Furkan Atmaca, Şaziye Betül Özateş, Gözde Berk, Seyyit Talha Bedir, Abdullatif Köksal, Balkız Öztürk Başaran, Tunga Güngör, Arzucan Özgür.

Repository: UD_Turkish-BOUN
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-SA 4.0

Genre: nonfiction, news

Questions, comments? General annotation questions (either Turkish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [busra • marsan (æt) boun • edu • tr or saziye • bilgin (æt) boun • edu • tr]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS annotated manually
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

A Turkish dependency treebank annotated in UD style. Created by the members of TABILAB from Boğaziçi University.

This is a Turkish dependency treebank in the Universal Dependencies (UD) annotation style. The BOUN Treebank is created by TABILAB and supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) under grant number 117E971.

The BOUN Treebank includes a total of 9,761 manually annotated sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. The texts are taken from the Turkish National Corpus (TNC).

The dependency relations in the BOUN Treebank is manually annotated in the UD framework. The morphological features and UPOS information are first retrieved from the morphological parser of Sak et al. (2011) and converted to UD morphology automatically using our script. The morphological features, UPOS tags, XPOS tags, and lemma forms are then manually corrected in a systematic way.

Acknowledgments

We are immensely grateful to Prof. Yeşim Aksan and the other members of the Turkish National Corpus Team for their tremendous help in providing us with sentences from the Turkish National Corpus.

References

You can use the following arXiv reference for v2.11:

@article{marcsan2022enhancements,
title={Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish},
author={Mar{\c{s}}an, B{\"u}{\c{s}}ra and Akkurt, Salih Furkan and {\c{S}}en, Muhammet and G{\"u}rb{\"u}z, Merve and G{\"u}ng{\"o}r, Onur and {\"O}zate{\c{s}}, {\c{S}}aziye Bet{\"u}l and {\"U}sk{\"u}darl{\i}, Suzan and {\"O}zg{\"u}r, Arzucan and G{\"u}ng{\"o}r, Tunga and {\"O}zt{\"u}rk, Balk{\i}z},
journal={arXiv preprint arXiv:2207.11782},
year={2022}
}

You can use the following arXiv reference for the previous versions of this treebank:

@article{TurkEtAl2022,
title = {Resources for {{Turkish}} Dependency Parsing: Introducing the {{BOUN Treebank}} and the {{BoAT}} Annotation Tool},
author = {T{\"u}rk, Utku and Atmaca, Furkan and {\"O}zate{\c s}, {\c S}aziye Bet{\"u}l and Berk, G{\"o}zde and Bedir, Seyyit Talha and K{\"o}ksal, Abdullatif and Ba{\c s}aran, Balk{\i}z {\"O}zt{\"u}rk and G{\"u}ng{\"o}r, Tunga and {\"O}zg{\"u}r, Arzucan},
year = {2022},
month = mar,
journal = {Language Resources and Evaluation},
volume = {56},
number = {1},
pages = {259--307},
issn = {1574-0218},
doi = {10.1007/s10579-021-09558-0}
}

Statistics of UD Turkish BOUN

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AbbrAspectCaseEchoEvidentMoodNumberNumber[psor]NumTypePersonPerson[psor]PolarityPolitePronTypeReflexTenseTypoVerbFormVoice

Relations

acladvcladvmodadvmod:emphamodapposauxaux:qcasecccc:preconjccompclfcompoundcompound:lvccompound:redupconjcopcsubjcsubj:outerdepdep:derdetdiscoursediscourse:qdislocatedfixedflatiobjlistmarknmodnmod:partnmod:possnsubjnsubj:outernummodobjoblobl:tmodorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview