home edit page issue tracker

This page pertains to UD version 2.

UD Polish LFG

Language: Polish (code: pl)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.2 release.

The following people have contributed to making this treebank part of UD: Agnieszka Patejuk, Adam Przepiórkowski.

Repository: UD_Polish-LFG
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: GNU GPL 3.0

Genre: fiction, nonfiction, news, spoken, social

Questions, comments? General annotation questions (either Polish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [aep (æt) ipipan • waw • pl, adamp (æt) ipipan • waw • pl]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

The LFG Enhanced UD treebank of Polish is based on a corpus of LFG (Lexical Functional Grammar) syntactic structures generated by an LFG grammar of Polish, POLFIE, and manually disambiguated by human annotators.

The treebank consists of around 17,200 sentences (see the Data Split section for precise numbers). Thanks to the richness of the original LFG representations, it makes heavy use of enhanced dependencies. Secondary edges are used not only in representations of coordination (for shared dependents and shared governors), but also for various control-like constructions.

The annotation differs from (the release 2.1 of) the SZ UD treebank of Polish also in other respects, including the following:

Acknowledgments

The original LFG corpus has been developed under the supervision of Agnieszka Patejuk (many thanks to the annotators!) and has been converted to UD by Adam Przepiórkowski, in collaboration with Agnieszka Patejuk. Both the creation of the original LFG corpus and the conversion into UD have been partially supported by the Polish Ministry of Science and Higher Education within the CLARIN ERIC programme 2015–2018 (http://clarin.eu/). The data, lemmata and original morphosyntactic tags come from the manually annotated subcorpus of the National Corpus of Polish (led by Adam Przepiórkowski; http://nkjp.pl/), whose development was financed by the Polish Ministry of Science and Higher Education in 2007–2011, and – to a lesser extent – from the corpus Polish language of the 1960s (http://clip.ipipan.waw.pl/PL196x). Many thanks to Joakim Nivre and Dan Zeman for their infinite patience in answering a myriad of diverse UD-related questions during the development of this treebank.

References

If you use this treebank, you are encouraged to cite this book:

Agnieszka Patejuk and Adam Przepiórkowski. “From Lexical Functional Grammar to Enhanced Universal Dependencies: Linguistically informed treebanks of Polish.” Institute of Computer Science, Polish Academy of Sciences, Warsaw, 2018. Downloadable from http://nlp.ipipan.waw.pl/Bib/pat:prz:18:book.pdf.

@Book{pat:prz:18:book, author = {Agnieszka Patejuk and Adam Przepiórkowski}, title = {From {L}exical {F}unctional {G}rammar to Enhanced {U}niversal {D}ependencies: Linguistically informed treebanks of {P}olish}, publisher = {Institute of Computer Science, Polish Academy of Sciences}, year = 2018, address = {Warsaw}, url = {http://nlp.ipipan.waw.pl/Bib/pat:prz:18:book.pdf}}

Statistics of UD Polish LFG

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERB

Features

AdpTypeAgglutinationAspectCaseDegreeEmphaticGenderHyphMoodNumberNumber[psor]NumTypePartTypePersonPolarityPolitePossPrepCasePronTypePunctSidePunctTypeReflexSubGenderTenseVariantVerbFormVerbTypeVoice

Relations

aclacl:relcladvcladvmodamodapposauxaux:cliticaux:cndaux:impaux:passcasecccc:preconjccompccomp:objconjcopcop:locatcsubjdetdiscourseexpl:impersexpl:pvfixedflatiobjmarknmodnmod:possnsubjnsubj:passnummodobjoblobl:agentpunctrootvocativexcompxcomp:obj

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Verbs with Reflexive Core Objects

Relations Overview