home edit page issue tracker

This page pertains to UD version 2.

UD Punjabi Rang

Language: Punjabi (code: pa)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.18 release.

The following people have contributed to making this treebank part of UD: Rimsha Abid, Luigi Talamo, Helena Vaz, Andrew Dyer, Annemarie Verkerk.

Repository: UD_Punjabi-Rang
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18

License: CC BY-SA 4.0

Genre: fiction, news

Questions, comments? General annotation questions (either Punjabi-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [annemarie • verkerk (æt) uni-saarland • de]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

The Punjabi-Rang treebank is a manually annotated corpus in Punjabi (Shahmukhi script).

It contains 100 sentences from the first two chapters of The Petit Prince translated into Punjabi and 37 sentences from a blog page containing a discourse on the national Punjabi day. The data has been annotated according to Universal Dependencies guidelines.

The corpus is split contiguously into training, development, and test sets as follows:

Split Number of sentences
Train 67 (petit prince) + 14 (discourse)
Dev 17 (petit prince) + 13 (discourse)
Test 16 (petit prince) + 10 (discourse)

Annotation follows the Universal Dependencies v2 guidelines for tokenization, part-of-speech tags, and dependency relations.

Data was collected manually from the first two chapters of The Petit Prince (Punjabi translation, Shahmukhi script) and from the blog post at https://www.express.pk/story/2020057/kya-pnjaby-sqaft-madwm-hwrhy-he-2020057

Acknowledgments

The treebank was annotated by Rimsha Abid. Supervision and revision by Luigi Talamo, Helena Vaz, Andy Dyer and Annemarie Verkerk.

References

In preparation

Statistics of UD Punjabi Rang

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERB

Features

AspectCaseDegreeGenderMoodNumberNumTypePersonPossPronTypeReflexTenseVerbForm

Relations

aclacl:relcladvcladvmodadvmod:emphamodauxcasecccc:preconjccompcompoundconjcopdetdet:possdiscourseiobjmarknmodnmod:possnmod:tmodnsubjnummodobjoblobl:agentobl:argobl:tmodparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview