UD Turkish English BUTR
Language: Turkish English (code: qti)
Family: Code switching
This treebank has been part of Universal Dependencies since the UD v2.16 release.
The following people have contributed to making this treebank part of UD: Furkan Akkurt, Nursena Teker, Helin Binici, Ahmet Demir, Konstantinos Sampanis.
Repository: UD_Turkish_English-BUTR
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-SA 4.0
Genre: spoken, social
Questions, comments? General annotation questions (either Turkish English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [furkan • akkurt (æt) bogazici • edu • tr]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
UD_Turkish_English-BUTR is a treebank of Turkish-English code-switched sentences collected from Boğaziçi University students, annotated in the Universal Dependencies framework to provide a standardized resource for analyzing syntactic patterns in Turkish-English code-switching.
The UD_Turkish_English-BUTR treebank contains annotated Turkish-English code-switched sentences collected from Boğaziçi University students. The term “Boğaziçi Turkish” refers to the variety of Turkish influenced by English commonly spoken by Boğaziçi University students and characterized by frequent code-switching. This linguistic phenomenon, sometimes referred to as “Boğaziçi Tarzancası” (Boğaziçi Tarzan-speak) in informal settings, represents a distinct sociolinguistic practice that has remained largely unexamined in the linguistic literature.
The treebank was developed using a semi-automated annotation pipeline within the Universal Dependencies framework. The process began with preliminary annotation using the language model Claude 3.5 Sonnet, followed by manual verification and correction by four annotators using ArboratorGrew. The annotation scheme aligns with existing Turkish UD treebanks while incorporating necessary adjustments for code-switching phenomena, particularly in head assignment within mixed-language constructions.
Qualitative analysis of the treebank reveals distinctive code-switching patterns, including English verbs with Turkish auxiliaries, academic terminology, and pragmatic expressions. A notable pattern is the morphological integration of English verbs into Turkish syntax, exemplified by constructions like “drop-bylayacağım” (“I will drop by”), where English phrasal verbs receive Turkish morphological markers.
The Universal Dependencies analysis demonstrates three key syntactic patterns in Boğaziçi Turkish: preservation of Turkish syntactic structure with English lexical insertions, morphological adaptation of English verbs, and code-switching at specific syntactic boundaries.
The treebank captures three main contact phenomena:
- Code-switching (CS): Full English phrases or clauses embedded in Turkish discourse, both intrasentential and intersentential.
- Lexical adaptation (LA): English words integrated with Turkish morphology, such as “drop-bylayacağım” (“I will drop by”) or “overthinkledim” (“I overthought”).
- Loan translation (LT): Calques of English expressions using Turkish lexemes, such as “toplantı almak” (loan translation of “to get a meeting”).
Each sentence is annotated with the following comment-level metadata:
# type— Primary contact phenomenon (CS, LA, or LT)# text_en— English translation# medium— Communication medium (Written or Spoken), where known
Token-level language identification is provided via Lang=tr / Lang=en in the MISC column. Morpheme-level code-switching boundaries are marked with CSID=MIXED and CSPoint features.
This treebank provides a standardized resource for analyzing syntactic patterns in Turkish-English code-switching, facilitating further research in computational linguistics.
Acknowledgments
We would like to express our gratitude to all the Boğaziçi University students who participated in our survey and provided examples of code-switched sentences for this treebank. Their contributions were essential for capturing authentic instances of Turkish-English code-switching patterns.
We thank the Universal Dependencies community for their guidelines and support during the annotation process. Special thanks to the ArboratorGrew team for providing the annotation platform that facilitated our collaborative work.
We also acknowledge the contributions of Claude 3.5 Sonnet in the preliminary annotation phase, which helped streamline our workflow and allowed the annotation team to focus on refining and validating the dependency structures.
This work was conducted as part of a research project at Boğaziçi University, with support from the Departments of Linguistics and Computer Engineering. We appreciate the academic environment that encouraged this interdisciplinary collaboration between computational and sociolinguistic approaches to the study of code-switching.
References
- Nivre, J., et al. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association.
- Guillaume, B., et al. (2021). Grew-match: An online tool for comparative corpus queries and quantitative analyses of UD treebanks. In Proceedings of the Fourth Workshop on Universal Dependencies.
- Anthropic (2024). Claude 3.5 Sonnet [Large Language Model]. https://www.anthropic.com/claude
Statistics of UD Turkish English BUTR
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Aspect – Case – Evident – ExtPos – Gender – Mood – Number – Number[psor] – NumType – Person – Person[psor] – Polarity – Poss – PronType – Tense – Typo – VerbForm – Voice
Relations
acl – advcl – advmod – amod – aux – case – cc – ccomp – compound – conj – det – discourse – fixed – flat – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 58 sentences and 441 tokens.
- This corpus contains 69 tokens (16%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 9 types of words that contain both letters and punctuation. Examples: De-Google, Hoca'nın, KK'ya, Let's, That's, You're, doesn't, drop-bylayacağım, turn-offluyor
Morphology
Tags
- This corpus uses 14 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: INTJ, SYM, X
- This corpus contains 3 word types tagged as particles (PART): not, to, ya
- This corpus contains 16 lemmas tagged as pronouns (PRON): I, ben, bir, biri, bu, bura, he, it, ne, nere, o, sen, siz, that, this, you
- This corpus contains 10 lemmas tagged as determiners (DET): a, all, bir, bu, hangi, hiçbir, my, o, other, the
- Out of the above, 3 lemmas occurred sometimes as PRON and sometimes as DET: bir, bu, o
- This corpus contains 5 lemmas tagged as auxiliaries (AUX): be, değil, do, mi, would
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: do
- There are 5 (de)verbal forms:
- Conv
- VERB: düşününce, takılıp, vermeden, çıkmadan
- Fin
- AUX: were
- VERB: Go, Let's, makes, own, thought
- Inf
- VERB: go
- Part
- VERB: alan, bitching, coming, dediklerini, ettiğine, giden, olacağı, olduğunuz, spinning, supposed
- Vnoun
- VERB: etmeye, flörtleşmek, görüşmek, sormaya, yapmak
Nominal Features
- Neut
- PRON: it
- Plur
- AUX: değiliz
- NOUN: Guys, decorations, aspectlere, detaylara, machinelerden, terimlerde, şeyler
- PRON: Sizce
- VERB: alabiliriz, dediklerini, değinemiyoruz, duyduk, eyleyelim, gideceğiz, görüşürüz, olduğunuz, olmayız, uyuyacağız
- VERB-Part: dediklerini, olduğunuz
- Sing
- ADJ: gayriintentionaldı, karşıyayım, üzgünüm
- AUX: değil, misin
- NOUN: şey, Bro, distinction, hat, head, kaşar, lord, mercy, page, price
- PRON: bu, i, Ben, bana, bence, beni, it, Birinin, Bunda, Bunu
- PROPN: barbie, Allah, Cem, Cumartesi, Daktilo, Didar, Erzincan, KK'ya, Kazım, Koyuncu
- VERB: ettim, geldi, Depends, Gel, Seems, Soggyleşmiş, Yemişsin, attın, başlasam, bilmiyorum
- VERB-Fin: makes, own
- VERB-Part: ettiğine, olacağı
- Abl
- ADV: önceden
- NOUN: machinelerden
- VERB-Conv: vermeden, çıkmadan
- Acc
- NOUN: boynumu, dersini, hocayı, sitesini, yolu, Şarkıyı
- PRON: beni, it, Bunu, onu, seni
- VERB-Part: dediklerini
- Dat
- NOUN: Aklıma, Derse, aspectlere, detaylara, meşgule, üstüne, üzerine
- PRON: bana
- PROPN: KK'ya
- VERB-Part: ettiğine
- VERB-Vnoun: etmeye, sormaya
- Equ
- PRON: bence, Sizce
- Gen
- NOUN: Dünyanın, Hoca'nın, Okulun
- PRON: Birinin, Bunun
- Ins
- NOUN: ihtimalle, itemıyla, metroyla, sevgilimle
- Loc
- NOUN: Kafamda, arada, gözlemede, tanımımda, terimlerde
- PRON: Bunda, burada, nerede
- Nom
- NOUN: şey, kaşar, Canım, Hoca, Kanka, aile, akşam, ayağın, cümle, diziydi
- PRON: bu, i, Ben, u, Ne, biri, o, you
- PROPN: Allah, Cem, Cumartesi, Daktilo, Didar, Erzincan, Kazım, Koyuncu, Yılmaz, ceren
- VERB: yapmak, etmek, flörtleşmek, görüşmek, olacağı, yürüyecek
- VERB-Part: olacağı, yürüyecek
- VERB-Vnoun: flörtleşmek, görüşmek, yapmak
Degree and Polarity
- Neg
- ADJ: yok
- AUX: değil, değiliz
- PART: not
- VERB: bilmiyorum, değinemiyoruz, düşünmezsen, düşürmedim, eyleyemiyorum, istemiyormuş, istemiyorum, olmayız, vermeden
- VERB-Conv: vermeden
- Pos
- ADJ: var
- VERB: ettim, geldi, yapmak, Gel, Soggyleşmiş, Yemişsin, alabiliriz, alan, attın, başlasam
- VERB-Conv: düşününce, takılıp, çıkmadan
- VERB-Part: alan, dediklerini, ettiğine, giden, olacağı, olduğunuz, yaşayan, yürüyecek
- VERB-Vnoun: etmeye, flörtleşmek, görüşmek, sormaya, yapmak
Verbal Features
- Hab
- VERB: düşünmezsen, odaklandırır
- Imp
- VERB: alabiliriz, görüşürüz, istemiyormuş, olmayız
- Perf
- VERB: ettim, geldi, Soggyleşmiş, Yemişsin, attın, dedi, diyeceğim, drop-bylayacağım, duyduk, duyuldu
- Prog
- VERB: bilmiyorum, değinemiyoruz, duyuluyo, ediyorum, eyleyemiyorum, hissediyor, istemiyorum, turn-offluyor, çalışıyor, çıkıyoruz
- Prosp
- VERB: edecek
- Cnd
- VERB: başlasam, düşünmezsen, söyleseydin, yapsa
- Imp
- VERB: Gel, Go, Let's, eyle, yapsana
- VERB-Fin: Go, Let's
- Ind
- AUX-Fin: were
- VERB-Fin: makes, own, thought
- Opt
- VERB: eyleyelim
- Pot
- VERB: alabiliriz, eyleyemiyorum
- Fut
- VERB: diyeceğim, drop-bylayacağım, edecek, gideceğiz, olacağı, uyuyacağız, yürüyecek
- VERB-Part: olacağı, yürüyecek
- Past
- AUX-Fin: were
- VERB: ettim, geldi, Soggyleşmiş, Yemişsin, attın, dedi, dediklerini, duyduk, duyuldu, düşündüm
- VERB-Fin: thought
- VERB-Part: dediklerini, ettiğine, olduğunuz, supposed
- Pres
- AUX: değil, değiliz
- VERB: Depends, Seems, alabiliriz, alan, bilmiyorum, bitching, coming, değinemiyoruz, duyuluyo, düşünmezsen
- VERB-Fin: makes, own
- VERB-Part: alan, bitching, coming, giden, spinning, texting, yaşayan
- Pass
- VERB: duyuldu, duyuluyo
- Fh
- VERB: duyuldu
- Nfh
- VERB: Soggyleşmiş, Yemişsin, istemiyormuş, katletmişsin
Pronouns, Determiners, Quantifiers
- Art
- DET: bir, a, the
- Dem
- DET: bu, o
- PRON: this, bu, That's, Bunda, Bunu, Bunun, burada, o
- Ind
- DET: Other
- PRON: Birinin, biri
- Int
- DET: hangi
- PRON: Ne, nerede
- Neg
- DET: hiçbir
- Prs
- DET: my
- PRON: i, Ben, Me, You're, bana, bence, beni, he, it, u
- Tot
- DET: All
- Card
- NUM: iki, bir
- Yes
- DET: my
- 1
- ADJ: karşıyayım, üzgünüm
- AUX: değiliz
- NOUN: zorundayım
- PRON: i, Ben, bana, bence, beni
- VERB: ettim, Let's, alabiliriz, başlasam, bilmiyorum, değinemiyoruz, diyeceğim, drop-bylayacağım, duyduk, düşündüm
- VERB-Fin: Let's, own
- 2
- AUX: misin
- PRON: u, Sizce, seni, you
- VERB: Gel, Go, Yemişsin, attın, düşünmezsen, eyle, katletmişsin, olduğunuz, söyleseydin, yapsana
- VERB-Fin: Go
- VERB-Part: olduğunuz
- 3
- ADJ: gayriintentionaldı
- AUX: değil
- NOUN: şey, kaşar, Aklıma, Canım, Derse, Dünyanın, Hoca, Hoca'nın, Kafamda, Kanka
- PRON: bu, it, Birinin, Bunda, Bunu, Bunun, biri, burada, onu
- PROPN: Allah, Cem, Cumartesi, Daktilo, Didar, Erzincan, KK'ya, Kazım, Koyuncu, Yılmaz
- VERB: geldi, Depends, Seems, Soggyleşmiş, dedi, dediklerini, duyuldu, duyuluyo, edecek, ettiğine
- VERB-Fin: makes
- VERB-Part: dediklerini, ettiğine, olacağı
- Sing
- NOUN: Aklıma, Canım, Kafamda, ayağın, boynumu, dersini, gerçeği, sevgilimle, sitesini, tanımımda
- VERB-Part: ettiğine, olacağı
Other Features
- ExtPos
- ADJ
- ADJ: karşı
- ADV
- ADP: Of
- DET: All
- ADJ
- Person[psor]
- 1
- NOUN: Aklıma, Canım, Kafamda, boynumu, sevgilimle, tanımımda, zorundayım
- 2
- NOUN: ayağın
- 3
- NOUN: dersini, gerçeği, sitesini, tokası, üslubu, üstüne, üzerine, şarkısı
- VERB-Part: ettiğine, olacağı
- 1
- Typo
- Yes
- ADV: know
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus uses 5 lemmas as auxiliaries (aux). Examples: değil, mi, be, do, would.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (3)
- VERB--NOUN-Nom (6)
- VERB--PRON (3)
- VERB--PRON-Nom (4)
- VERB-Fin--PRON-Nom (2)
- VERB-Part--PRON (1)
- VERB-Part--PRON-Gen (1)
- VERB-Part--PRON-Nom (2)
- VERB-Vnoun--PRON-Nom (1)
- obj
- VERB--NOUN (1)
- VERB--NOUN-Abl (1)
- VERB--NOUN-Acc (3)
- VERB--NOUN-Dat (1)
- VERB--NOUN-Nom (1)
- VERB--PRON (2)
- VERB--PRON-Acc (3)
- VERB-Fin--NOUN (1)
- VERB-Fin--NOUN-Nom (1)
- VERB-Part--NOUN-Acc (1)
- VERB-Part--PRON-Nom (1)
- VERB-Vnoun--NOUN-Nom (1)
- VERB-Vnoun--PRON-Acc (1)