home edit page issue tracker

This page pertains to UD version 2.

UD Polish PDB

Language: Polish (code: pl)
Family: Indo-European, Slavic

This treebank has been part of Universal Dependencies since the UD v1.2 release.

The following people have contributed to making this treebank part of UD: Alina Wróblewska, Daniel Zeman, Jan Mašek, Rudolf Rosa.

Repository: UD_Polish-PDB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-NC-SA 4.0

Genre: fiction, nonfiction, news

Questions, comments? General annotation questions (either Polish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [zeman (æt) ufal • mff • cuni • cz]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

The Polish PDB-UD treebank is based on the Polish Dependency Bank 2.0 (PDB 2.0), created at the Institute of Computer Science, Polish Academy of Sciences in Warsaw. The PDB-UD treebank is an extended and corrected version of the Polish SZ-UD treebank (the release 2.3).

The PDB-UD treebank consists of 22,208 sentences (351K tokens). It contains all 8K sentences of the Polish UD-SZ treebank and further 14K unique sentences. The additional sentences enclose linguistic phenomena that did not occur or were not annotated (e.g. relative clauses, reported speech) in the UD-SZ trees. The PDB-UD treebank contains enhanced graphs, i.e. trees with the enhanced edges encoding the shared dependents and the shared governors of coordinated conjuncts (9167 PDB-UD trees contain enhanced edges).

The morphological, syntactic and semantic annotation of the PDB-UD treebank is created through a conversion of PDB 2.0 data. The conversion procedure has been designed and implemented by Alina Wróblewska partly based on the conversion of the UD-SZ trees.

Acknowledgments

We would like to thank all of the contributors of the original Polish Dependency Bank 2.0. The development of the PDB-UD treebank was founded by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.

Statistics of UD Polish PDB

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrAdpTypeAnimacyAspectCaseCliticConjTypeDegreeEmphaticForeignGenderHyphMoodNounFormNumberNumber[psor]NumFormNumTypePartTypePersonPolarityPossPrepCasePronTypePunPunctSidePunctTypeReflexTenseVariantVerbFormVerbTypeVoice

Relations

aclacl:relcladvcladvcl:relcladvmodadvmod:argadvmod:negamodamod:flatapposauxaux:cliticaux:cndaux:impaux:passcasecccc:preconjccompccomp:objconjcopcsubjdetdet:numgovdet:nummoddet:possdiscourse:emodiscourse:intjexpl:pvfixedflatiobjlistmarknmodnmod:argnmod:flatnmod:prednsubjnsubj:passnummodnummod:govobjoblobl:agentobl:argobl:cmprorphanparataxisparataxis:insertparataxis:objpunctrootvocativexcompxcomp:predxcomp:subj

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Verbs with Reflexive Core Objects

Relations Overview