UD for Pashto

For the principles of transliteration of Pashto used in UD see the Transliteration page.
Tokenization and Word Segmentation
- The words are delimited by whitespaces and punctuation.
- Multiword tokens are used in these cases:
- separable light verb verbs in forms, when the two parts are joined, e.g. بندوم bandawë́m “I close” → کوم band + بند kawë́m
- contractions of pronouns and adpositions, e.g. پرې pre “on it/him/her/them” → پر یې për ye (see ADP for details)
Morphology
Lemmatization
- Direct (nominative) singular masculine (if applicable) form is used as lemma for nominals.
- Infinitive (in the direct case) is usually used as lemma for verbs.
- The existential word شته šta “there is / there are”, tagged as VERB, has only one form, so it is used as the lemma.
Tags
- The overview of part-of-speech tags used in Pashto can be found here.
- Pashto uses all 17 universal tags.
- Several words are tagged as PART:
- Negative particles نه në and مه ma “no, not” and the affirmative particle هو ho “yes”.
- Modal particles: باید bấyad (necessity “must, have to, should”) and دې de (order or desire “let, should”).
- These verbs may be tagged as AUX:
- verb ول wël “to be”, used as copula and auxiliary verb for perfect tenses
- verb کېدل kedë́l “to become”, used for passive voice or potential forms
- verb کول kawë́l “to do”, used for longer potential forms
- word بۀ bë, used for future tense, habitual past forms and in conditional sentences
- word ونۀ wënë́ joining a perfective prefix (in affirmative forms connected to the verb) with a negation particle, this may hold also for some other prefixes.
- Modal verbs, such as غوښتل ġox̌të́l “to want to”, are tagged as VERB (besides that, some modal meanings are expressed using modal particles or using the mentioned potential verb forms). Light verbs are tagged also as VERB, with the nominal part depending on them with
compound:lvc
relation. - Pronouns that depend on nouns and behave similarly like their attributes (some of them even agree with the nouns in number, case and gender) are tagged as DET; possessive pronouns are mostly treated this way. Pronouns used individually (often as arguments of a verb) are tagged as PRON. This includes e.g. relative or non-possessive personal pronouns. Various interrogative, demonstrative or indefinite pronouns can be tagged both ways depending on the situation. Enclitic weak pronouns, used as unstressed core arguments or as alternative possessive pronouns, are always tagged as PRON, even when marking possession, because they do not have the attributive relation to the noun, e.g. they follow the noun, while all other pronouns tagged as DET precede it. Directional pronouns, merged with several prepositions, are separated from them with a PRON tag.
- The deverbal forms like infinitive or participles (sometimes behaving like verbal noun and verbal adjectives) are usually tagged as VERB. Only nouns and adjectives originally derived from infinitives or participles, but now perceived clearly as nouns and adjectives, are tagged as NOUN and ADJ.
- Adjectives and adverbs derived from adjectives have often the same form. Their tagging as ADJ or ADV depends on the context.
Features
- The overview of all features used in Pashto can be found here.
- There are three VerbForm values used in Pashto: finite
Fin
, infinitiveInf
and participlePart
. - An important feature of Pashto verbs is Aspect, which strictly divides verb forms to imperfect
Imp
and perfecgtPerf
. - The finite verb forms inflect for Mood feature with indicative
Ind
, imperativeImp
, subjunctiveSub
and potentialPot
values. - The finite verb forms conjugate for Tense feature taking present
Pres
or pastPast
mark, with an auxiliary with marking the futureFut
. - The finite verb forms inflect also for the Person feature with the common three values, which is also an inherent feature of many personal pronouns.
- Generally all inlfectional parts of speech inflect for Number taking a singular
Sing
or a pluralPlur
value. There is a few collective nouns having the collectiveColl
number. Infinitives always behave like plural, so they do not have the number tagged. Non-past finite verb forms do not have the number feature in the third person, since the forms for both numbers are always identical. - Nominals, participles and infinitives inflect for Case feature. There are five cases tagged in UD for Pashto: direct (marked as nominative)
Nom
, oblique (marked as accusative)Acc
, locativeLoc
, ablativeAbl
and vocativeVoc
. - Nouns and some pronouns have inherent Gender feature with two possible values: masculine
Masc
and feminineFem
. Adjectives, other pronouns and participles inflect for the gender in order to agree with nouns. Finite verb forms inflect for the gender only in the past forms in the third person (both singular and plural). - Other important features of Pashto include PronType, Animacy, Deixis or Variant.
Syntax
- The overview of all dependency relations used in Pashto can be found here.
Core Arguments
- Core arguments (subjects and objects) in Pashto are mostly nouns, pronouns or infinitives in either bare direct case
Nom
or bare oblique caseAcc
. The exact use of these cases depends on the inherent transitivity of the verb and the voice and tense used (language phenomenon called split ergativity occurring also in other Indo-Iranian langages) - The only argument (i.e. the subject) of intransitive verbs or of transitive verbs used in the passive voice are always in the direct case
Nom
. - For transitive verbs in the active voice holds:
- The subject in non-past tenses is always in the direct case
Nom
. - The subject in past tenses is always in the oblique case
Acc
. - The object in all tenses is almost always in the direct case
Nom
.- The only exceptions are personal pronouns for the first and the second person singular in non-past tenses, where oblique forms ما mâ “me”, تا tâ “you” are used instead of the direct رۀ zë “I”, تۀ të “you”.
- The subject usually comes before the object regardless of the case.
- The verb agrees in
Person
andNumber
(and alsoGender
in the third person of past tenses) with the subject in non-past tenses and with the object in past tenses.
- The subject in non-past tenses is always in the direct case
intransitive | transitive | ||
nsubj | nsubj | obj | |
non-past | direct | direct | direct * |
past | direct | oblique | direct |
- exceptions: ما mâ, تا tâ (see above)
Non-verbal Clauses
- The copula verb ول wël “to be” is used in most non-verbal clauses.
- The nominal part of the predicate is usually in the direct case
Nom
- In the existential clauses the word شته šta “there is / there are” is used, but it is tagged VERB
Relations Overview
-
The following relation subtypes are used in Pashto:
Auxiliaries (see aux):
- aux:pass for passive voice
- aux:pot for potential mood
- aux:perf for perfect tenses (do not confuse with the perfective aspect)
- aux:fut for future tense
- aux:hab for habitual past tense
- aux:cnd for conditional mood
Orphan constructions (see orphan):
- orphan:nsubjobj for orphan objects dependent on orphan subjects
- orphan:nsubjobl for orphan obliques dependent on orphan subjects
- orphan:objobl for orphan obliques dependent on orphan objects
Other:
- nsubj:pass for nominal subjects of passive
- compound:lvc for nominal part of light verb construction
Treebanks
There is currently one Pashto UD treebank: