home edit page issue tracker

This page pertains to UD version 2.

UD Old French SRCMF

Language: Old French (code: fro)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v2.2 release.

The following people have contributed to making this treebank part of UD: Sophie Prévost, Aurélie Collomb, Kim Gerdes, Isabelle Tellier, Marine Courtin, Alexei Lavrentiev, Céline Guillot-Barbance, Loïc Grobol.

Repository: UD_Old_French-SRCMF
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.6

License: CC BY-NC-SA 3.0

Genre: nonfiction, legal, poetry

Questions, comments? General annotation questions (either Old French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [sophie • prevost (æt) ens • fr]. Development of the treebank happens in the UD repository but not directly in the final CoNLL-U files. You may submit bug fixes as pull requests against the dev branch but you have to go to the folder called not-to-release and locate the source files there. Contact the treebank maintainers if in doubt.

Annotation Source
Lemmas not available
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS annotated manually
Features assigned by a program, not checked manually
Relations assigned by a program, with some manual corrections, but not a full manual verification

Description

UD_Old_French-SRCMF is a conversion of (part of) the SRCMF corpus (Syntactic Reference Corpus of Medieval French srcmf.org).

UD_Old_French-SRCMF consists in 10 texts spanning from 9th to 13th century. It includes 17678 sentences and 170 741 tokens.

Sentences are annotated with the following metadata:

The following table lists the texts used in this treebank :

ID Author Name of the text Number of tokens
Strasbourg_842_prose anonymous Serments de Strasbourg 115
StLegier_1000_verse anonymous Vie de saint Léger 1,388
StAlexis_1050_verse anonymous Vie de saint Alexis 4,750
Roland_1100_verse anonymous Chanson de Roland 28,752
Lapidaire_mid12_prose anonymous Lapidaire en prose 4,708
QuatreLivresReis_late12_prose anonymous Quatre livres des reis 12,949
BeroulTristan_late12_verse Beroul, Tristan Tristan de Beroul 26,766
TroyesYvain_1180_verse Chrestien de Troyes, Yvain Yvain de Chretien de Troyes 41,256
Aucassin_early13_verse-prose anonymous Aucassin et Nicolet 9,838
Graal_1225_prose anonymous Queste del Saint Graal 40,219

Acknowledgments

UD_Old_French-SRCMF results from the conversion of (part of) the SRCMF corpus (Syntactic Reference Corpus of Medieval French srcmf.org).

The SRCMF corpus results from the SRCMF project which took place in 2008-2012, funded by the ANR (France) and the DFG (Germany), and supervised by Sophie Prévost and Achim Stein.

The SRCMF project consisted in the manual syntactic annotation of 15 texts (251,000 tokens) from the 9th to 13th C. Part-of-speech tags were for most of them retrieved from the already existing tagging of the texts (stemming from: Base de Français Medieval, Lyon, ENS de Lyon, IHRIM Laboratory http://txm.bfm-corpus.org, and the Nouveau Corpus d’Amsterdam http://www.uni-stuttgart.de/lingrom/stein/corpus#nca)

The contributors to the SRCMF project were: Stein, Achim; Prévost, Sophie; Rainsford, Tom; Mazziotta, Nicolas; Bischoff Béatrice; Glikman, Julie; Lavrentiev, Alexei; Heiden, Serge; Guillot-Barbance, Céline; Marchello-Nizia, Christiane.

The whole SRCMF corpus (251,000 tokens) was converted into UD dependencies, but only 172,000 tokens have so far undergone a significant checking: the remaining 80,000 tokens will be added to UD_Old_French-SRCMF in a future release.

The conversion from the original SRCMF annotation to the SRCMF-UD annotation was done automatically both for the POS and the syntactic relations, thanks to a set of elaborated rules. Some 1,200 syntactic relations left unlabelled were then manually annotated (Sophie Prévost), and significant spot-checking occurred, focusing on potential difficulties (eg. conj relation).

This conversion was achieved by Aurélie Collomb, in the frame of a internship funded by lab Lattice (Paris, CNRS, ENS & Université Sorbonne Nouvelle Paris 3, PSL & USPC), and supervised by Sophie Prévost, Isabelle Tellier and Kim Gerdes. Marine Courtin achieved the deposit of the files, and especially took in charge the validation of the corpus through the successive steps of the process.

A significant review of this initial release has been done on the occasion of the UD 2.6 release by Loïc Grobol and Sophie Prévost in the frame of the ANR PROFITEROLE project in order to improve the compliance of the corpus to UD guidelines. This includes both automatic corrections using the graph rewriting system GREW (Bonfante et al., 2018) and extensive manual corrections.

References

Statistics of UD Old French SRCMF

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPRONPROPNSCONJVERB

Features

DefiniteMorphNumTypePolarityPossPronTypeTenseVerbForm

Relations

aclacl:relcladvcladvmodadvmod:oblamodapposauxaux:passcasecase:detcccc:ncccompcompoundconjcopcsubjdetdiscoursedislocatedexplfixedflatiobjmarkmark:advmodmark:oblnmodnmod:apposnsubjnsubj:advmodnsubj:objnummodobjobj:advmodobj:advnegobj:obloblobl:advmodobl:modparataxisrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview