home edit page issue tracker

This page pertains to UD version 2.

UD Polish PDB

Language: Polish (code: pl)
Family: Indo-European, Slavic

This treebank has been part of Universal Dependencies since the UD v1.2 release.

The following people have contributed to making this treebank part of UD: Alina Wróblewska, Daniel Zeman, Jan Mašek, Rudolf Rosa.

Repository: UD_Polish-PDB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.14

License: CC BY-NC-SA 4.0

Genre: fiction, nonfiction, news

Questions, comments? General annotation questions (either Polish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [alina (æt) ipipan • waw • pl]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

The Polish PDB-UD treebank is automatically converted from the Polish Dependency Bank 2.0 (PDB 2.0). Both treebanks were created at the Institute of Computer Science, Polish Academy of Sciences in Warsaw (Poland).

The PDB-UD treebank consists of 22,152 sentences (350K tokens) from Polish National Corpus, Europarl, DGT-Translation Memory, OPUS, Pelcra Parallel Corpus, CDSCorpus and literature. PDB-UD is an extended and corrected version of the Polish SZ-UD treebank (release 1.2 to 2.3).

The morphological, syntactic and semantic annotation of the PDB-UD treebank is rule-based converted from PDB 2.0 data.

The PDB-UD treebank contains enhanced graphs, i.e. trees with enhanced edges encoding the shared dependents and the shared governors of coordinated conjuncts (9141 PDB-UD trees contain enhanced edges).

Acknowledgments

We want to thank all of the original Polish Dependency Bank 2.0 contributors. The development of the PDB-UD treebank (v2.5) was founded by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. The development of the PDB-UD treebank (v2.13) was founded by Digital Research Infrastructure for the Arts and Humanities DARIAH-PL (project no. POIR.04.02.00-00-D006/20-00).

Statistics of UD Polish PDB

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrAdpTypeAnimacyAspectCaseConjTypeDegreeForeignGenderHyphMoodNumberNumber[psor]NumFormNumTypePartTypePersonPolarityPolitePossPrepCasePronTypePunctSidePunctTypeReflexTenseVariantVerbFormVerbTypeVoice

Relations

aclacl:relcladvcladvcl:cmpradvcl:relcladvmodadvmod:argadvmod:emphadvmod:negamodamod:flatapposauxaux:cliticaux:cndaux:impaux:passcasecccc:preconjccompccomp:cleftccomp:objconjcopcsubjcsubj:passdepdetdet:numgovdet:nummoddet:possdiscoursediscourse:intjexpl:pvfixedflatflat:foreigniobjlistmarknmodnmod:argnmod:flatnmod:possnmod:prednsubjnsubj:passnummodnummod:flatnummod:govobjoblobl:agentobl:argobl:cmprobl:orphanorphanparataxis:insertparataxis:objpunctrootvocativexcompxcomp:cleftxcomp:predxcomp:subj

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Verbs with Reflexive Core Objects

Relations Overview