home edit page issue tracker

This page pertains to UD version 2.

UD Italian Old

Language: Italian (code: it)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.13 release.

The following people have contributed to making this treebank part of UD: Claudia Corbetta, Marco Passarotti, Flavio Massimiliano Cecchini, Giovanni Moretti.

Repository: UD_Italian-Old
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17

License: CC BY-SA 4.0

Genre: poetry

Questions, comments? General annotation questions (either Italian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [claudia • corbetta (æt) unibg • it]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS assigned by a program, not checked manually
Features annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
Relations annotated manually, natively in UD style

Description

Italian-Old is a treebank containing Dante Alighieri’s Comedy (composed between approximately 1306 and 1321), based on the 1994 Petrocchi edition and taken from the DanteSearch corpus, originally created at the University of Pisa, Italy. It is a treebank of Old Italian, specifically Florentine.

This treebank includes 3 419 sentences (122 038 syntactic words) and is a literary text (poetry). It is divided into three sections, known as Cantiche: Inferno, Purgatorio, and Paradiso. Specifically, Inferno includes 1 228 sentences and 41 368 syntactic words; Purgatorio consists of 1 174 sentences and 41 277 syntactic words; while Paradiso contains 1 017 sentences and 39 393 syntactic words.

The treebank is split into three subsets, dev, test and train, with approximate ratios of 10%, 10%, and 80%, respectively, for Inferno, Purgatorio, and Paradiso. These subsets are then merged into unique dev, test and train sets.

The distribution of Inferno (tokens: 41 368) with respect to the subsets is as follows:

The distribution of Purgatorio (tokens: 41 277) with respect to the subsets is as follows:

The distribution of Paradiso (tokens: 39 393) with respect to the subsets is as follows:

The treebank also includes enhanced dependencies annotation, with a specific focus on the orphan dependency relation. For the criteria adopted in the enhanced annotation, please refer to the paper “«Are you Afraid of Ghosts?» A Proposal for Busting Predicate Ellipsis in Universal Dependencies” (Corbetta et al., 2025).

!! Italian-Old treebank is still under revision to check for mistakes and inconsistencies throughout the annotation of the Cantiche; therefore, its structure is subject to change. If you use the resource and find any problems, please do not hesitate to contact the author to suggest a correction or improvement.

Acknowledgments

This work has been carried out in collaboration with the research center CIRCSE (Università Cattolica del Sacro Cuore di Milano) and the University of Pavia-Bergamo (Università degli Studi di Pavia; Università degli Studi di Bergamo). We extend our gratitude to all the individuals who made this work possible. The annotation of the sonnet by Arnaut Daniel (Purgatorio, XXVI vv.140-147) was carried out by Michele Tron.

For any doubts, suggestions, or reports, please do not hesitate to contact the person in charge: claudia.corbetta@unibg.it.

References

To cite the treebank please refer to:

Other:

For information on the enhancement process, refer to:

Statistics of UD Italian Old

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AspectCliticDefiniteDegreeForeignGenderMoodNumberNumTypePersonPossPronTypeReflexTenseVerbFormVoice

Relations

aclacl:relcladvcladvcl:cmpadvcl:predadvcl:relcladvmodadvmod:lmodadvmod:negadvmod:tmodamodapposauxaux:passcaseccccompccomp:reportedconjcopcsubjcsubj:passdetdet:possdet:predetdiscoursedislocatedexplexpl:impersexpl:passexpl:pvfixedflatflat:foreignflat:nameflat:redupiobjmarknmodnmod:lmodnmod:possnsubjnsubj:outernsubj:passnummodobjoblobl:agentobl:argobl:lmodobl:tmodorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Reflexive Passive

Verbs with Reflexive Core Objects

Relations Overview