home edit page issue tracker

This page pertains to UD version 2.

UD for Polish

Tokenization and Word Segmentation

Morphology

Tags

    • Polish in principle uses all 17 universal POS categories: SYM is only used in the PDB treebank to mark symbols, e.g. % (percent), ° (degree), + (plus), - (minus), $ (dollar), or emojis, e.g. :-), and X is only used in the PDB treebank (to mark abbreviations and digits). * The NOUN tag is used not only for prototypical nouns, but also – somewhat arbitrarily – for gerunds (the so-called -nie/-cie forms), which have both nominal and verbal properties. * Pronouns (PRON) are here understood as personal pronouns, so-called reflexive pronouns (also in their non-reflexive and – generally – non-pronominal uses), and such nominal pronouns as kto “who”, nic “nothing” and wszyscy “everybody”. * As Polish grammars do not include a separate part of speech determiner, the DET class is based on a word list and includes words treated by standard Polish tagsets as adjectives, numerals or even nouns: * determiners treated elsewhere as adjectives include possessive pronouns, as well as words such as ten “this”, każdy “each”, taki “such”, którykolwiek “whichever”, etc., * determiners treated elsewhere as numerals include indefinite numerals (e.g., wiele “many”, niedużo “not much, not many”, kilka “several”), as well as fractional numerals such as pół “half”, * one determiner treated elsewhere as a noun is mnóstwo “a lot”. * The main auxiliary verb (AUX) in Polish is być (“to be”), with the aspectual variant bywać “to be (habitual)”. This auxiliary verb is used in several types of constructions: * the copula with predicative phrases, * periphrastic future tense (future form of być + infinitive or so-called l-participle form of the main verb), * periphrastic conditional (any form of być + the conditional mood marker by + l-participle of the main verb), * (imperfective) periphrastic passive (any form of być, including periphrastic forms, + passive participle of the main verb). * Another auxiliary, zostać “become” (and its habitual version zostawać), is used for the perfective periphrastic passive (any form of zostać + passive participle of the main verb). Additionally, mood markers by (conditional) and niech (imperative, also its variants niechaj, niechże, niechby) are marked as AUX, as are “mobile inflections” and the copular uses of to (usually, but inappropriately in this context, translated as “this”). * The words być, bywać, zostać and zostawać may also occur as normal VERB if they are used in purely existential sentences (i.e., ones that do not even indicate location because if they do, then they should be treated as copulas). * Verbs with modal meaning are not considered auxiliary in Polish. * There are five main (de)verbal forms, distinguished by the UPOS tag and the value of the VerbForm feature: * Infinitive Inf, tagged VERB or AUX. * Finite verb Fin, tagged VERB or AUX. * Converb Conv (an adverbial participle), tagged VERB or (in principle, but not in release 2.2) AUX. * Participle Part (an adjectival participle), tagged ADJ. * Verbal noun Vnoun (a gerund), tagged NOUN. * Inherently impersonal forms ending in -no/-to (a specialty of Polish and Ukrainian) are marked as finite verbs with Person=0 (and Tense=Past).

Nominal Features

Pronouns, Determiners, Numerals

Degree and Polarity

Verbal Features

Other Features

Syntax

Core and Oblique Dependents

Non-verbal (Predicative) Clauses

Relations Overview

This is an overview only. For more detailed discussion and examples, see the list of Polish relations.

Treebanks

There are three Polish UD treebanks: