home edit page issue tracker

This page pertains to UD version 2.

UD Turkish English BUTR

Language: Turkish English (code: qti)
Family: Code switching

This treebank has been part of Universal Dependencies since the UD v2.16 release.

The following people have contributed to making this treebank part of UD: Furkan Akkurt, Nursena Teker, Helin Binici, Ahmet Demir, Konstantinos Sampanis.

Repository: UD_Turkish_English-BUTR
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either Turkish English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [furkanakkurt7242 (æt) icloud • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

UD_Turkish_English-BUTR is a treebank of Turkish-English code-switched sentences collected from Boğaziçi University students, annotated in the Universal Dependencies framework to provide a standardized resource for analyzing syntactic patterns in Turkish-English code-switching.

The UD_Turkish_English-BUTR treebank contains annotated Turkish-English code-switched sentences collected from Boğaziçi University students. The term “Boğaziçi Turkish” refers to the variety of Turkish influenced by English commonly spoken by Boğaziçi University students and characterized by frequent code-switching. This linguistic phenomenon, sometimes referred to as “Boğaziçi Tarzancası” (Boğaziçi Tarzan-speak) in informal settings, represents a distinct sociolinguistic practice that has remained largely unexamined in the linguistic literature.

The treebank was developed using a semi-automated annotation pipeline within the Universal Dependencies framework. The process began with preliminary annotation using the language model Claude 3.5 Sonnet, followed by manual verification and correction by four annotators using ArboratorGrew. The annotation scheme aligns with existing Turkish UD treebanks while incorporating necessary adjustments for code-switching phenomena, particularly in head assignment within mixed-language constructions.

Qualitative analysis of the treebank reveals distinctive code-switching patterns, including English verbs with Turkish auxiliaries, academic terminology, and pragmatic expressions. A notable pattern is the morphological integration of English verbs into Turkish syntax, exemplified by constructions like “drop-bylayacağım” (“I will drop by”), where English phrasal verbs receive Turkish morphological markers.

The Universal Dependencies analysis demonstrates three key syntactic patterns in Boğaziçi Turkish: preservation of Turkish syntactic structure with English lexical insertions, morphological adaptation of English verbs, and code-switching at specific syntactic boundaries.

This treebank provides a standardized resource for analyzing syntactic patterns in Turkish-English code-switching, facilitating further research in computational linguistics. While the initial release contains a modest number of representative sentences, the resource will hopefully be expanded in future releases.

Acknowledgments

We would like to express our gratitude to all the Boğaziçi University students who participated in our survey and provided examples of code-switched sentences for this treebank. Their contributions were essential for capturing authentic instances of Turkish-English code-switching patterns.

We thank the Universal Dependencies community for their guidelines and support during the annotation process. Special thanks to the ArboratorGrew team for providing the annotation platform that facilitated our collaborative work.

We also acknowledge the contributions of Claude 3.5 Sonnet in the preliminary annotation phase, which helped streamline our workflow and allowed the annotation team to focus on refining and validating the dependency structures.

This work was conducted as part of a research project at Boğaziçi University, with support from the Departments of Linguistics and Computer Engineering. We appreciate the academic environment that encouraged this interdisciplinary collaboration between computational and sociolinguistic approaches to the study of code-switching.

References

Statistics of UD Turkish English BUTR

POS Tags

ADJADPADVAUXCCONJDETNOUNNUMPARTPRONPROPNPUNCTSCONJVERB

Features

AspectCaseEvidentExtPosGenderMoodNumberNumber[psor]NumTypePersonPerson[psor]PolarityPronTypeTenseTypoVerbFormVoice

Relations

acladvcladvmodamodauxcaseccccompcompoundconjdetdiscoursefixedflatmarknmodnsubjnummodobjoblparataxispunctrootxcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview