home edit page issue tracker

This page pertains to UD version 2.

UD Sinhala Appuwa

Language: Sinhala (code: si)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.18 release.

The following people have contributed to making this treebank part of UD: Warangana Sammani, Luigi Talamo, Annemarie Verkerk.

Repository: UD_Sinhala-Appuwa
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18

License: CC BY-SA 4.0

Genre: fiction

Questions, comments? General annotation questions (either Sinhala-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [luigi • talamo (æt) uni-saarland • de]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

This treebank contains a manually annotated Sinhala narrative based on the folk story “Appuwa”, created as part of a project on treebank development for understudied languages within the Universal Dependencies framework.

This treebank is based on the Sinhala folk story “Appuwa”, a traditional narrative from Sri Lankan culture. The data was provided as text and annotated for linguistic analysis using the Universal Dependencies (UD) framework.

The treebank was developed within the context of the project “Treebanks for the cross-linguistic study of coercive discourse in understudied languages”. The goal of this effort is to contribute linguistic resources for Sinhala, a relatively under-resourced language in computational linguistics.

All annotation was carried out manually using ArboratorGrew, following UD guidelines. The annotation includes tokenization, morphological features, and syntactic dependency relations. Validation was performed using the official UD validation tools.

Acknowledgments

This work was carried out as part of an academic collaboration project. The annotation was completed by Sammani Warangana, a BA student in Computational Linguistics at the University of Tübingen, with support from the project team.

References

The text is based on a traditional Sinhala folk narrative (“Appuwa”). The exact original published source of the story is not specified.

Background reference: Chandralal, Dileep (2010). Sinhala. John Benjamins Publishing.

Statistics of UD Sinhala Appuwa

POS Tags

ADJADPADVAUXDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERB

Features

AnimacyAspectCaseDefiniteDegreeGenderMoodNumberNumTypePersonPolarityPossPronTypeTenseVerbFormVoice

Relations

acladvcladvmodamodapposauxcaseccompclfcompoundcompound:lvccompound:svcconjdetdiscourseflat:nameiobjmarknmodnmod:possnsubjnummodobjoblobl:tmodparataxispunctrootxcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview