UD Turkish English BUTR
Language: Turkish English (code: qti)
Family: Code switching
This treebank has been part of Universal Dependencies since the UD v2.16 release.
The following people have contributed to making this treebank part of UD: Furkan Akkurt, Nursena Teker, Helin Binici, Ahmet Demir, Konstantinos Sampanis.
Repository: UD_Turkish_English-BUTR
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: spoken
Questions, comments? General annotation questions (either Turkish English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [furkanakkurt7242 (æt) icloud • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
UD_Turkish_English-BUTR is a treebank of Turkish-English code-switched sentences collected from Boğaziçi University students, annotated in the Universal Dependencies framework to provide a standardized resource for analyzing syntactic patterns in Turkish-English code-switching.
The UD_Turkish_English-BUTR treebank contains annotated Turkish-English code-switched sentences collected from Boğaziçi University students. The term “Boğaziçi Turkish” refers to the variety of Turkish influenced by English commonly spoken by Boğaziçi University students and characterized by frequent code-switching. This linguistic phenomenon, sometimes referred to as “Boğaziçi Tarzancası” (Boğaziçi Tarzan-speak) in informal settings, represents a distinct sociolinguistic practice that has remained largely unexamined in the linguistic literature.
The treebank was developed using a semi-automated annotation pipeline within the Universal Dependencies framework. The process began with preliminary annotation using the language model Claude 3.5 Sonnet, followed by manual verification and correction by four annotators using ArboratorGrew. The annotation scheme aligns with existing Turkish UD treebanks while incorporating necessary adjustments for code-switching phenomena, particularly in head assignment within mixed-language constructions.
Qualitative analysis of the treebank reveals distinctive code-switching patterns, including English verbs with Turkish auxiliaries, academic terminology, and pragmatic expressions. A notable pattern is the morphological integration of English verbs into Turkish syntax, exemplified by constructions like “drop-bylayacağım” (“I will drop by”), where English phrasal verbs receive Turkish morphological markers.
The Universal Dependencies analysis demonstrates three key syntactic patterns in Boğaziçi Turkish: preservation of Turkish syntactic structure with English lexical insertions, morphological adaptation of English verbs, and code-switching at specific syntactic boundaries.
This treebank provides a standardized resource for analyzing syntactic patterns in Turkish-English code-switching, facilitating further research in computational linguistics. While the initial release contains a modest number of representative sentences, the resource will hopefully be expanded in future releases.
Acknowledgments
We would like to express our gratitude to all the Boğaziçi University students who participated in our survey and provided examples of code-switched sentences for this treebank. Their contributions were essential for capturing authentic instances of Turkish-English code-switching patterns.
We thank the Universal Dependencies community for their guidelines and support during the annotation process. Special thanks to the ArboratorGrew team for providing the annotation platform that facilitated our collaborative work.
We also acknowledge the contributions of Claude 3.5 Sonnet in the preliminary annotation phase, which helped streamline our workflow and allowed the annotation team to focus on refining and validating the dependency structures.
This work was conducted as part of a research project at Boğaziçi University, with support from the Departments of Linguistics and Computer Engineering. We appreciate the academic environment that encouraged this interdisciplinary collaboration between computational and sociolinguistic approaches to the study of code-switching.
References
- Nivre, J., et al. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association.
- Guillaume, B., et al. (2021). Grew-match: An online tool for comparative corpus queries and quantitative analyses of UD treebanks. In Proceedings of the Fourth Workshop on Universal Dependencies.
- Anthropic (2024). Claude 3.5 Sonnet [Large Language Model]. https://www.anthropic.com/claude
Statistics of UD Turkish English BUTR
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Aspect – Case – Evident – ExtPos – Gender – Mood – Number – Number[psor] – NumType – Person – Person[psor] – Polarity – PronType – Tense – Typo – VerbForm – Voice
Relations
acl – advcl – advmod – amod – aux – case – cc – ccomp – compound – conj – det – discourse – fixed – flat – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 51 sentences and 393 tokens.
- This corpus contains 62 tokens (16%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 8 types of words that contain both letters and punctuation. Examples: Hoca'nın, KK'ya, Let's, That's, You're, doesn't, drop-bylayacağım, turn-offluyor
Morphology
Tags
- This corpus uses 14 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: INTJ, SYM, X
- This corpus contains 3 word types tagged as particles (PART): not, to, ya
- This corpus contains 15 lemmas tagged as pronouns (PRON): I, ben, bir, biri, bu, bura, he, it, nere, o, sen, siz, that, this, you
- This corpus contains 10 lemmas tagged as determiners (DET): a, all, bir, bu, hangi, hiçbir, my, o, other, the
- Out of the above, 3 lemmas occurred sometimes as PRON and sometimes as DET: bir, bu, o
- This corpus contains 5 lemmas tagged as auxiliaries (AUX): be, değil, do, mi, would
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: do
- There are 5 (de)verbal forms:
- Conv
- VERB: vermeden, çıkmadan
- Fin
- AUX: were
- VERB: Go, Let's, own, thought
- Inf
- VERB: go
- Part
- VERB: alan, bitching, coming, ettiğine, giden, olacağı, olduğunuz, spinning, supposed, texting
- Vnoun
- VERB: etmeye, flörtleşmek, görüşmek, sormaya, yapmak
Nominal Features
- Neut
- PRON: it
- Plur
- AUX: değiliz
- NOUN: Guys, decorations, aspectlere, machinelerden, terimlerde, şeyler
- PRON: Sizce
- VERB: alabiliriz, değinemiyoruz, eyleyelim, gideceğiz, görüşürüz, olduğunuz, olmayız, uyuyacağız, çıkıyoruz
- VERB-Part: olduğunuz
- Sing
- ADJ: karşıyayım
- AUX: değil, misin
- NOUN: şey, Bro, distinction, hat, head, lord, mercy, price, taste, Aklıma
- PRON: bu, i, Ben, bana, bence, beni, it, Birinin, Bunda, Bunu
- PROPN: barbie, Allah, Cem, Cumartesi, Daktilo, Didar, KK'ya, Kazım, Koyuncu, Yılmaz
- VERB: ettim, geldi, Depends, Gel, Seems, Soggyleşmiş, Yemişsin, attın, başlasam, bilmiyorum
- VERB-Fin: own
- VERB-Part: ettiğine, olacağı
- Abl
- ADV: önceden
- NOUN: machinelerden
- VERB-Conv: vermeden, çıkmadan
- Acc
- NOUN: boynumu, dersini, hocayı, sitesini, yolu, Şarkıyı
- PRON: beni, it, Bunu, onu, seni
- Dat
- NOUN: Aklıma, Derse, aspectlere, meşgule, üstüne, üzerine
- PRON: bana
- PROPN: KK'ya
- VERB-Part: ettiğine
- VERB-Vnoun: etmeye, sormaya
- Equ
- PRON: bence
- Gen
- NOUN: Dünyanın, Hoca'nın, Okulun
- PRON: Birinin, Bunun
- Ins
- NOUN: itemıyla, metroyla, sevgilimle
- Loc
- NOUN: Kafamda, arada, gözlemede, tanımımda, terimlerde
- PRON: nerede, Bunda, burada
- Nom
- NOUN: şey, Canım, Hoca, Kanka, aile, akşam, ayağın, cümle, diziydi, gece
- PRON: bu, i, Ben, u, Sizce, biri, o, you
- PROPN: Allah, Cem, Cumartesi, Daktilo, Didar, Kazım, Koyuncu, Yılmaz, ceren
- VERB: yapmak, etmek, flörtleşmek, görüşmek, olacağı, yürüyecek
- VERB-Part: olacağı, yürüyecek
- VERB-Vnoun: flörtleşmek, görüşmek, yapmak
Degree and Polarity
- Neg
- AUX: değil, değiliz
- PART: not
- VERB: bilmiyorum, değinemiyoruz, düşünmezsen, düşürmedim, istemiyormuş, istemiyorum, olmayız, vermeden
- VERB-Conv: vermeden
- Pos
- VERB: ettim, geldi, yapmak, Gel, Soggyleşmiş, Yemişsin, alabiliriz, alan, attın, başlasam
- VERB-Conv: çıkmadan
- VERB-Part: alan, ettiğine, giden, olacağı, olduğunuz, yaşayan, yürüyecek
- VERB-Vnoun: etmeye, flörtleşmek, görüşmek, sormaya, yapmak
Verbal Features
- Hab
- VERB: düşünmezsen
- Imp
- VERB: alabiliriz, görüşürüz, istemiyormuş, olmayız
- Perf
- VERB: ettim, geldi, Soggyleşmiş, Yemişsin, attın, dedi, diyeceğim, drop-bylayacağım, duyuldu, düşürmedim
- Prog
- VERB: bilmiyorum, değinemiyoruz, ediyorum, hissediyor, istemiyorum, turn-offluyor, çalışıyor, çıkıyoruz
- Prosp
- VERB: edecek
- Cnd
- VERB: başlasam, düşünmezsen, söyleseydin, yapsa
- Imp
- VERB: Gel, Go, Let's, eyle, yapsana
- VERB-Fin: Go, Let's
- Ind
- AUX-Fin: were
- Opt
- VERB: eyleyelim
- Pot
- VERB: alabiliriz
- Fut
- VERB: diyeceğim, drop-bylayacağım, edecek, gideceğiz, olacağı, uyuyacağız, yürüyecek
- VERB-Part: olacağı, yürüyecek
- Past
- AUX-Fin: were
- VERB: ettim, geldi, Soggyleşmiş, Yemişsin, attın, dedi, duyuldu, düşürmedim, ettiğine, gönderdi
- VERB-Fin: thought
- VERB-Part: ettiğine, olduğunuz, supposed
- Pres
- AUX: değil, değiliz
- VERB: Depends, Seems, alabiliriz, alan, bilmiyorum, bitching, coming, değinemiyoruz, düşünmezsen, ediyorum
- VERB-Fin: own
- VERB-Part: alan, bitching, coming, giden, spinning, texting, yaşayan
- Pass
- VERB: duyuldu
- Fh
- VERB: duyuldu
- Nfh
- VERB: Soggyleşmiş, Yemişsin, istemiyormuş, katletmişsin
Pronouns, Determiners, Quantifiers
- Dem
- PRON: o
- Card
- NUM: iki, bir
- 1
- ADJ: karşıyayım
- AUX: değiliz
- NOUN: zorundayım
- PRON: i, Ben, bana, bence, beni
- VERB: ettim, Let's, alabiliriz, başlasam, bilmiyorum, değinemiyoruz, diyeceğim, drop-bylayacağım, düşürmedim, ediyorum
- VERB-Fin: Let's, own
- 2
- AUX: misin
- PRON: u, Sizce, seni, you
- VERB: Gel, Go, Yemişsin, attın, düşünmezsen, eyle, katletmişsin, olduğunuz, söyleseydin, yapsana
- VERB-Fin: Go
- VERB-Part: olduğunuz
- 3
- AUX: değil
- NOUN: şey, Aklıma, Canım, Derse, Dünyanın, Hoca, Hoca'nın, Kafamda, Kanka, Okulun
- PRON: bu, it, Birinin, Bunda, Bunu, Bunun, biri, burada, onu
- PROPN: Allah, Cem, Cumartesi, Daktilo, Didar, KK'ya, Kazım, Koyuncu, Yılmaz, ceren
- VERB: geldi, Depends, Seems, Soggyleşmiş, dedi, duyuldu, edecek, ettiğine, gönderdi, hissediyor
- VERB-Part: ettiğine, olacağı
- Sing
- NOUN: Aklıma, Canım, Kafamda, ayağın, boynumu, dersini, gerçeği, sevgilimle, sitesini, tanımımda
- VERB-Part: ettiğine, olacağı
Other Features
- ExtPos
- ADJ
- ADJ: karşı
- ADV
- ADP: Of
- DET: All
- ADJ
- Person[psor]
- 1
- NOUN: Aklıma, Canım, Kafamda, boynumu, sevgilimle, tanımımda, zorundayım
- 2
- NOUN: ayağın
- 3
- NOUN: dersini, gerçeği, sitesini, tokası, üslubu, üstüne, üzerine, şarkısı
- VERB-Part: ettiğine, olacağı
- 1
- Typo
- Yes
- ADV: know
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus uses 5 lemmas as auxiliaries (aux). Examples: değil, mi, be, do, would.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (3)
- VERB--NOUN-Nom (5)
- VERB--PRON (3)
- VERB--PRON-Nom (4)
- VERB-Fin--PRON-Nom (2)
- VERB-Part--PRON (1)
- VERB-Part--PRON-Gen (1)
- VERB-Part--PRON-Nom (2)
- VERB-Vnoun--PRON-Nom (1)
- obj
- VERB--NOUN (1)
- VERB--NOUN-Abl (1)
- VERB--NOUN-Acc (3)
- VERB--NOUN-Dat (1)
- VERB--NOUN-Nom (1)
- VERB--PRON (2)
- VERB--PRON-Acc (3)
- VERB-Fin--NOUN-Nom (2)
- VERB-Part--NOUN-Acc (1)
- VERB-Vnoun--NOUN-Nom (1)
- VERB-Vnoun--PRON-Acc (1)