home edit page issue tracker

This page pertains to UD version 2.

UD Polish MPDT

Language: Polish (code: pl)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.17 release.

The following people have contributed to making this treebank part of UD: Kamil Tomaszek, Alina Wróblewska, Aleksandra Wieczorek.

Repository: UD_Polish-MPDT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17

License: CC BY-SA 4.0

Genre: nonfiction, bible, legal, fiction

Questions, comments? General annotation questions (either Polish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [kt • tomaszek (æt) student • uw • edu • pl]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS assigned by a program, with some manual corrections, but not a full manual verification
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

UD_Polish-MPDT is a treebank of Middle Polish (17th–18th centuries). It is a rule-based conversion of the Middle Polish Dependency Treebank (Wieczorek, 2025) from its original annotation to the Universal Dependencies format. The MPDT sentences are sourced from the KorBa corpus (Gruszczyński et al., 2022).

The UD_Polish-MPDT treebank contains sentences from the Middle Polish period (17th–18th centuries). The material is drawn from the KorBa corpus – The Electronic Corpus of 17th- and 18th-century Polish Texts – a large and diverse collection of Polish literature, scientific texts, official documents, press releases, and more from 1601–1772.

The syntactic annotations originate from the Middle Polish Dependency Treebank, a project led by Aleksandra Wieczorek, which adds a dependency layer to a selected part of KorBa. The original MPDT annotation follows the conventions of the Polish Dependency Bank (PDB).

This initial UD release contains 2,018 sentences and approximately 47K tokens, with plans for further expansion in future versions.

Acknowledgments

Statistics of UD Polish MPDT

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrAdpTypeAnimacyAspectCaseConjTypeDegreeExtPosForeignGenderHyphMoodNumberNumber[psor]NumFormNumTypePartTypePersonPolarityPossPrepCasePronTypePunctSidePunctTypeReflexTenseVariantVerbFormVerbTypeVoice

Relations

aclacl:relcladvcladvcl:cmpradvcl:relcladvmodadvmod:argadvmod:emphadvmod:negamodamod:flatapposauxaux:cliticaux:cndaux:impaux:passcasecccc:preconjccompccomp:cleftccomp:objconjcopcsubjdepdetdet:possdiscourse:intjexpl:pvfixedflatiobjlistmarknmodnmod:argnmod:flatnmod:possnsubjnsubj:outernsubj:passnummodnummod:flatobjoblobl:agentobl:argobl:cmprorphanparataxisparataxis:insertparataxis:objpunctrootvocativexcompxcomp:cleftxcomp:predxcomp:subj

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Verbs with Reflexive Core Objects

Relations Overview