UD Pashto Prince
Language: Pashto (code: ps)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.18 release.
The following people have contributed to making this treebank part of UD: Salwan Aziz, Luigi Talamo, Annemarie Verkerk.
Repository: UD_Pashto-Prince
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-SA 4.0
Genre: fiction, government
Questions, comments? General annotation questions (either Pashto-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [annemarie • verkerk (æt) uni-saarland • de]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
The UD Pashto-Prince treebank contains manually annotated Pashto sentences from two textual sources: 50 sentences from Le Petit Prince, which was then translated and adapted into Northern Pashto, and 14 sentences from a Pashto prose text on Pashtun leadership. All sentences are annotated natively according to Universal Dependencies guidelines.
The Pashto-Prince treebank is a manually annotated Universal Dependencies (UD) treebank for Pashto. It consists of a total of 64 sentences drawn from two sources:
50 sentences from Le Petit Prince, originally sourced from an online Pashto version. The original text reflects Afghan (Southern) Pashto; therefore, the sentences were manually rewritten and adapted into Northern Pashto to reflect dialectal differences in morphology, lexicon, and syntax before annotation.
14 sentences from a Pashto prose text titled Silent Pashtun Leadership, sourced from an online PDF publication.
All sentences were manually annotated for lemmas, universal part-of-speech tags (UPOS), morphological features, and dependency relations following the Universal Dependencies v2 guidelines. The annotations were performed directly in Pashto without automatic pre-annotation.
Acknowledgments
We thank Salwan Aziz for the manual translation and adaptation of the Le Petit Prince sentences into Northern Pashto and for carrying out the complete manual annotation of all sentences in the treebank. We also thank the course instructors and supervisors for their guidance and feedback on Universal Dependencies annotation standards.
References
de Saint-Exupéry, A. (1943). Le Petit Prince. Pashto version available at: https://pashtogaheez.com/books/637
Silent Pashtun Leadership. Source: https://www.pashtoonkhwa.com/?cnt=3037&page=pashtoonkhwa
Universal Dependencies Consortium. (2024). Universal Dependencies v2. https://universaldependencies.org
Statistics of UD Pashto Prince
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Aspect – Case – Deixis – Mood – Number – NumType – Person – Polarity – Poss – PronType – Reflex – Tense – VerbForm
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – aux:pass – case – cc – ccomp – compound – compound:lvc – compound:prt – conj – cop – det – det:poss – discourse – expl – fixed – iobj – mark – nmod – nsubj – nsubj:pass – nummod – obj – obl – obl:agent – obl:arg – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 64 sentences and 1180 tokens.
- This corpus contains 43 tokens (4%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 3 types of words that contain both letters and punctuation. Examples: "ویره, دې., ۔پښتون
Morphology
Tags
- This corpus uses 14 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: INTJ, SYM, X
- This corpus contains 12 word types tagged as particles (PART): او, بس, به, خو, را, مه, نه, نو, هم, و, يا, یا
- This corpus contains 20 lemmas tagged as pronouns (PRON): _, تاسو, ته, خپل, دا, دی, زه, سه, مې, هغه, هغوی, هغې, ورباندې, ورته, ویې, ټول, ځان, څه, څوک, یې
- This corpus contains 13 lemmas tagged as determiners (DET): _, خپل, دا, داسې, داگھہ, دغه, هر, هغه, هېڅ, يو, ټول, څو, یو
- Out of the above, 5 lemmas occurred sometimes as PRON and sometimes as DET: _, خپل, دا, هغه, ټول
- This corpus contains 4 lemmas tagged as auxiliaries (AUX): بۀ, ول, کول, کېدل
- Out of the above, 2 lemmas occurred sometimes as AUX and sometimes as VERB: کول, کېدل
- There are 3 (de)verbal forms:
- Fin
- AUX: وو, دی, دي, وي, ده, شي, وم, کېږي, دې, دې.
- VERB: شي, شو, وکړو, شم, ولیدو, وکتل, ویل, ښکارېدو, کوي, کړه
- Inf
- VERB: کولای, ایښودل, بایلل, تيرولو, جوړول, جوړيدو, راپاڅول, هضم, هېرولای, ورکول
- Part
- AUX: شوی
- VERB: شوي, شوې, لواړ, لیدلي, ويلی, ښودلې, کړي, کړی, ایښی, لوبولی
Nominal Features
- Plur
- AUX-Fin: دي, وې, یو
- DET: خپلو
- NOUN: خبرې, مشرانو, حکمرانانو, خلک, لفظونو, مشران, پښتنو, کالو, ګزیانو, ادارې
- PRON: مو, دوی, هغوی, تاسو, ستاسو, هغوې, ټولو
- VERB-Fin: اوویل, اړوو, محسوسوو, وګورئ, کوو
- Sing
- AUX-Fin: وو, دی, شي, وي, کېږي, دې, دې., شم, شو, وم
- DET: خپله, خپل, داگھہ, هغه, دا
- NOUN: کتاب, تصویر, حال, قوم, ماضي, مشرتابه, مار, تصوير, خاوره, خوب
- PRON: ما, زما, دا, زه, یې, مې, هغه, يي, څوک, ترې
- VERB-Fin: شي, شو, ولیدو, وکتل, وکړو, ښکارېدو, اخلي, تيروي, تېرولو, ختمېږي
- Acc
- NOUN: هاتي
- PRON: ما, راباندې, زما, ماته, يې
- Nom
- NOUN: تصویر, مار, ټوپئ
- PRON: زه, هغوې
Degree and Polarity
- Neg
- PART: نه, مه
Verbal Features
- Perf
- AUX-Part: شوی
- VERB-Part: کړی, ایښی, لوبولی, لگولې, لیکلي, نيولې, وهلی
- Cnd
- AUX: به
- VERB-Fin: کولی
- Imp
- VERB: کړه, وګورئ
- VERB-Fin: وګورئ, کړه
- Ind
- AUX-Fin: وو, دی, دي, ده, کېږي, دې, دې., شم, شو, وم
- VERB-Fin: شو, وکړو, ولیدو, وکتل, ویل, ښکارېدو, کوي, اخلي, اوویل, اوګورم
- Sub
- AUX-Fin: وي, شي
- VERB: شي, شئ, وکړې, شم, ورکم, پرېږدم, کړم, کړي
- VERB-Fin: شي, شم, ورکم, پرېږدم, کړم, کړي
- Past
- AUX: وو, وم, شوم, شو, شوه, وه, وې, کړو
- AUX-Fin: وو, وم, شو, شوه, وه, وې, کړو
- VERB: کړو, شو, شوم, وکړو, راوخوت, ولیدو, وويل, وکتل, ویل, ښکارېدو
- VERB-Fin: شو, وکړو, ولیدو, وکتل, ویل, ښکارېدو, کړې, اوویل, تېرولو, رپولې
- Pres
- AUX-Fin: دی, دي, ده, شي, وي, کېږي, دې, دې., شم, یو
- VERB-Fin: شي, کوي, اخلي, اوګورم, اوګوري, اړوو, تيروي, ختمېږي, درېږي, راځي
Pronouns, Determiners, Quantifiers
- Dem
- DET: دا, هغه, داسې, داگھہ, دغه
- PRON: دا, دې, هغه, هغې
- Ind
- DET: يو, یو, يوه, یوه
- PRON: څوک, څه
- Int
- PRON: څه, څوک
- Neg
- DET: هېڅ
- Prs
- DET: خپله, خپل, خپلو
- PRON: ما, زما, زه, یې, مو, مې, خپل, دوی, هغه, ورته
- Rel
- DET: څو
- PRON: چا, کوم
- Tot
- DET: هر, ټول
- PRON: ټولو
- Card
- NUM: یو, شپږ, شپږو, دواړو, يوې
- Yes
- DET: خپله, خپل, خپلو
- PRON: زما, خپل, خپله, ستاسو, مو, هغوی, هغې, يي
- Yes
- DET: خپله, خپل, خپلو
- PRON: خپل, ځان, خپله
- 1
- AUX-Fin: شم, وم, کړو, یو
- PRON: ما, زما, زه, مو, مې, ماته, راباندې
- VERB-Fin: ولیدو, وکړو, اړوو, تېرولو, رپولې, شوم, محسوسوو, ووهلم, وښودلو, وکتل
- 2
- PRON: تاسو, درته, ستاسو
- VERB-Fin: وګورئ
- 3
- AUX-Fin: وو, دی, دي, شي, وي, کېږي, دې, دې., شو, وه
- PRON: یې, هغه, دوی, هغوی, يي, ترې, هغوې, ورباندې, ورته, ویې
- VERB-Fin: شي, شو, ښکارېدو, اخلي, اوویل, تيروي, ختمېږي, درېږي, راځي, لري
Other Features
- Deixis
- Prox
- DET: داگھہ, دا
- PRON: دا
- Remt
- DET: هغه
- PRON: هغه, هغې
- Prox
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: ول.
- This corpus uses 3 lemmas as auxiliaries (aux). Examples: ول, بۀ, کېدل.
- This corpus uses 1 lemmas as passive auxiliaries (aux:pass). Examples: کېدل.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (2)
- VERB--PRON (1)
- VERB--PRON-Acc (1)
- VERB-Fin--NOUN (11)
- VERB-Fin--PRON (14)
- VERB-Fin--PRON-Acc (15)
- VERB-Fin--PRON-Nom (1)
- VERB-Inf--NOUN (1)
- VERB-Inf--PRON (2)
- VERB-Inf--PRON-Acc (1)
- VERB-Part--NOUN (4)
- VERB-Part--PRON (3)
- VERB-Part--PRON-Acc (1)
- obj
- VERB--NOUN (3)
- VERB-Fin--NOUN (29)
- VERB-Fin--NOUN-ADP(د) (1)
- VERB-Fin--PRON (2)
- VERB-Inf--NOUN (3)
- VERB-Inf--NOUN-Acc (1)
- VERB-Inf--PRON (1)
- VERB-Part--NOUN (8)
- iobj
- VERB--PRON (1)
- VERB-Fin--NOUN-ADP(ته) (1)
- VERB-Fin--PRON-Acc (1)
Verbs with Reflexive Core Objects
- This corpus contains 1 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: پوهه ځان
Relations Overview
- This corpus uses 8 relation subtypes: acl:relcl, aux:pass, compound:lvc, compound:prt, det:poss, nsubj:pass, obl:agent, obl:arg
- The following 10 relation types are not used in this corpus at all: csubj, vocative, dislocated, clf, flat, list, orphan, goeswith, reparandum, dep