Introduction
UD currently contains five, partly related, but not yet completely homogenous, treebanks for Latin (more details within the specific documentations):
-
The Perseus Latin UD treebank (from v2.0) is based on the Latin Dependency Treebank 2.0 (LDT), currently maintained at Leipzig University (Humboldt Chair in DH, Prof. Gregory Crane). The treebank consists of literary texts of different genres mainly from the Classical age. The morphological and syntactic annotation of the Latin UD treebank is the result of a conversion of part of the LDT 2.0 data. The conversion has been made by Giuseppe G. A. Celano.
-
The Latin PROIEL UD treebank (from v2.0) is based on the Latin data from the PROIEL treebank release 20170214, which is maintained at the Department of Philosophy, Classics, History of Arts and Ideas at the University of Oslo. The treebank contains most of the Vulgate New Testament translation plus selections from Caesar’s Gallic War and Cicero’s Letters to Atticus. The data have been automatically converted to the UD scheme by Dag Haug. The original annotation guidelines are available at http://folk.uio.no/daghaug/syntactic_guidelines.pdf The treebank was converted to UD by Dag Haug. The conversion code is released as part of the PROIEL command-line interface.
-
The Latin ITTB UD treebank (from v2.0) is based on the Index Thomisticus Treebank (IT-TB), currently maintained at the Università Cattolica del Sacro Cuore, Milan, Italy. The treebank consists of texts by Thomas Aquinas: Books 1, 2, 3 and 4 of the Summa contra Gentiles, and concordances of the lemma forma found in several other works by Thomas. The morphological and syntactic annotation of the Latin-ITTB UD treebank is the result of a conversion process made by Marco C. Passarotti (Milan), Flavio M. Cecchini (Milan) and Dan Zeman (Prague).
-
The LLCT (Late Latin Charter Treebank) (from v2.6) consists of an automated conversion of the LLCT2 treebank from the Latin Dependency Treebank (LDT) format into the Universal Dependencies standard. The LLCT2 is the second part of three LLCT treebanks, the first part (LLCT1) being available in LDT format and the third part still under construction as of 4/2020. The LLCT2 contains 521 Early Medieval Latin original documents (charters) written in Tuscia (Tuscany), Italy, between AD 774 and 897. They all represent the legal (documentary) genre. Their language is a non-standard variety of Latin which differs from Classical as well as from Medieval Latin in terms of spelling, morphology, and syntax. The original annotation follows mainly the Guidelines for the Syntactic Annotation of Latin Treebanks (v. 1.3). However, an additional set of rules described is needed to annotate non-standard features.
-
The UDante treebank (from v2.8) is based on the Latin texts of Dante Alighieri, taken from the DanteSearch corpus, originally created at the University of Pisa, Italy. Specifically it contains the following works (mostly) by Dante Alighieri, or disputedly attibuted to him: De vulgari eloquentia ‘On eloquence in the vernacular’, Monarchia ‘About Monarchy’, Letters, Questio de aqua et terra ‘Discourse about water and earth’, Eclogues. Syntactic annotation has been created by a team of annotators through a manual annotation process performed in the context of a collaboration between the University of Pisa (responsible: Mirko Tavoni) and the LiLa: Linking Latin project at the Università Cattolica del Sacro Cuore, Milan, Italy (PI: Marco Passarotti). The annotation process was co-ordinated by Flavio Massimiliano Cecchini, Giovanni Moretti and Rachele Sprugnoli.
Subsequent corrections, updates and adjustments have been and are being constantly performed on the currently (v2.11) three active treebanks (IT-TB, LLCT and UDante) by the CIRCSE research group at the Università Cattolica del Sacro Cuore of Milan, Italy.