home edit page issue tracker

This page pertains to UD version 2.

UD Turkish Penn

Language: Turkish (code: tr)
Family: Turkic, Southwestern

This treebank has been part of Universal Dependencies since the UD v2.8 release.

The following people have contributed to making this treebank part of UD: Neslihan Cesur, Aslı Kuzgun, Olcay Taner Yıldız, Büşra Marşan, Neslihan Kara, Bilge Nas Arıcan, Merve Özçelik, Deniz Baran Aslan.

Repository: UD_Turkish-Penn
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-SA 4.0

Genre: nonfiction, news

Questions, comments? General annotation questions (either Turkish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [neslihancesur16 (æt) gmail • com; olcay • yildiz (æt) ozyegin • edu • tr]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually in non-UD style, automatically converted to UD
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

Turkish version of the Penn Treebank. It consists of a total of 9,560 manually annotated sentences and 87,367 tokens. (It only includes sentences up to 15 words long.)

This treebank includes a total of 9,560 annotated sentences. We used the corpus of the Penn Treebank by translating its sentences into Turkish language. In our corpus, we kept the sentence length at 15 words long. After the translation, the word tokens are morphologically annotated with a semi-automatic morphological analyzer. The dependency annotation is made manually. During the dependency annotation, annotators were able to see the original sentences from the Penn Treebank, therefore, they could check and correct the sentences according to the original data.

Acknowledgments

We wish to thank the Starlang Software for funding and supporting this work.

References

Statistics of UD Turkish Penn

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPRONPROPNPUNCTSCONJVERBX

Features

AspectCaseDefiniteDegreeMoodNumberNumber[psor]NumTypePersonPerson[psor]PolarityPronTypeReflexTenseTypoVerbFormVoice

Relations

acladvcladvmodamodapposauxcaseccccompclfcompoundconjcsubjdepdetdiscoursedislocatedfixedflatgoeswithiobjlistmarknmodnmod:tmodnsubjnsubj:outernummodobjoblorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview