home edit page issue tracker

This page pertains to UD version 2.

UD Turkish German SAGT

Language: Turkish German (code: qtd)
Family: Code switching

This treebank has been part of Universal Dependencies since the UD v2.7 release.

The following people have contributed to making this treebank part of UD: Özlem Çetinoğlu, Çağrı Çöltekin.

Repository: UD_Turkish_German-SAGT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.7

License: CC BY-NC-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either Turkish German-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [ozlem (æt) ims • uni-stuttgart • de]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

UD Turkish-German SAGT is a Turkish-German code-switching treebank that is developed as part of the SAGT project.

The treebank consists of bilingual conversation transcriptions annotated with several layers: language IDs, lemmas, POS tags, morphological features, and dependency relations. Language IDs employ the tag set of Çetinoğlu (2017). The rest of the annotations follow Universal Dependencies annotation scheme, and the conventions used in monolingual Turkish and German treebanks.

There are 48 distinct conversations from 17 participants. The majority of the speakers are university students, hence the most frequent age range is 18–25. Common conversation themes include studies, work, travel, free time activities such as sports, books, TV, and future plans.

The accompanying audio recordings of transcriptions are also available as a speech corpus, with a separate licence. Please contact ozlem@ims.uni-stuttgart.de for further information.

Acknowledgments

The treebank development is funded by DFG via project CE 326/1-1 “Computational Structural Analysis of German-Turkish Code-Switching”. We thank Cansu Turgut, Reha Sakızlı, Semanur Ceylan, and Sevde Ceylan for data collection and annotation.

References

For the treebank:

https://www.aclweb.org/anthology/W19-7809.pdf

For the speech collection (Note that the paper describes a separate speech corpus but the methodology is parallel.)

https://www.aclweb.org/anthology/W17-0804.pdf

Statistics of UD Turkish German SAGT

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AspectCaseDefiniteEvidentForeignGenderMoodNumberNumber[psor]NumTypePersonPerson[psor]PolarityPossPronTypeReflexTenseTypoVerbFormVoice

Relations

acladvcladvmodadvmod:emphamodapposappos:transauxaux:passaux:qcaseccccompcompoundcompound:lvccompound:prtcompound:redupconjcopcsubjdetdiscoursedislocatedexplexpl:pvfixedflatiobjmarknmodnsubjnsubj:passnummodobjoblorphanparataxisparataxis:discourseparataxis:transpunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Verbs with Reflexive Core Objects

Relations Overview