home edit page issue tracker

This page pertains to UD version 2.

UD Polish PUD

Language: Polish (code: pl)
Family: Indo-European, Slavic

This treebank has been part of Universal Dependencies since the UD v2.4 release.

The following people have contributed to making this treebank part of UD: Alina Wróblewska.

Repository: UD_Polish-PUD
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-SA 4.0

Genre: nonfiction, news

Questions, comments? General annotation questions (either Polish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [alina (æt) ipipan • waw • pl]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

This is the Polish portion of the Parallel Universal Dependencies (PUD) treebanks, created at the Institute of Computer Science, Polish Academy of Sciences in Warsaw (Poland).

PUD-PL consists of 1000 Polish sentences (18,384 tokens) in the same order as in the PUD treebanks in other languages. Morpho-syntactic annotations were automatically predicted by COMBO trained on Polish Dependency Bank 2.0 and then manually corrected. Finally, the trees were converted into the UD trees using the same converting procedure as in the case of the PDB-UD treebank. The annotation schema of PUD-PL is thus the same as in the Polish PDB-UD treebank. 459 PUD-PL trees contain enhanced edges.

Acknowledgments

The development of the PDB-UD treebank was founded by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. The development of the PUD-PL treebank (v2.13) was founded by Digital Research Infrastructure for the Arts and Humanities DARIAH-PL (project no. POIR.04.02.00-00-D006/20-00).

References

If you use the Polish PUD treebank, you are encouraged to cite this paper:

@inproceedings{pl,
author = {Wr{\'o}blewska, Alina},
title = {Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format},
booktitle = {Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)},
editor = {de Marneffe, Marie-Catherine and Lynn, Teresa and Schuster, Sebastian},
pages = {173--182},
publisher = {Association for Computational Linguistics},
year = {2018}
}

Statistics of UD Polish PUD

POS Tags

ADJADPADVAUXCCONJDETNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AbbrAdpTypeAnimacyAspectCaseConjTypeDegreeForeignGenderHyphMoodNumberNumber[psor]NumFormNumTypePartTypePersonPolarityPossPrepCasePronTypePunctSidePunctTypeReflexTenseVariantVerbFormVerbTypeVoice

Relations

aclacl:relcladvcladvcl:relcladvmodadvmod:argadvmod:emphadvmod:negamodamod:flatapposauxaux:cliticaux:cndaux:passcasecccc:preconjccompccomp:cleftccomp:objconjcopcsubjcsubj:passdepdetdet:numgovdet:nummoddet:possexpl:pvfixedflatflat:foreigniobjmarknmodnmod:argnmod:flatnmod:possnmod:prednsubjnsubj:passnummodnummod:govobjoblobl:agentobl:argobl:cmprobl:orphanorphanparataxis:insertparataxis:objpunctrootvocativexcompxcomp:predxcomp:subj

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Verbs with Reflexive Core Objects

Relations Overview