UD French ALTS
Language: French (code: fr)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.16 release.
The following people have contributed to making this treebank part of UD: Natalia Romanova, Rayan Ziane, Khensa Daoudi, Théo Brillet.
Repository: UD_French-ALTS
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: legal
Questions, comments? General annotation questions (either French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [natalia • romanova (æt) unicaen • fr]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
ALTS (AUTOMATED Sixteenth-century corpus) is a treebank of sixteenth-century legal French from Normandy and the Channel Islands.
Currently it contains two texts: 1) trial accounts from Guernsey Greffe (register Crime I), transcribed directly from the manuscript (1563-1569Guern**) and 2) an extract from Book 9 of Guillaume Terrien’s _Commentaires du droict civil tant public que privé observé au pays et duché de Normandie digitised from the original printed book (1578_Terrien**). The text of 1563-1569_Guern presents many dialectal Norman features and forms. The text of 1578_Terrien has some Latin words and expressions.
1563-1569_Guern
This text contains accounts of fifteen court cases on the island on Guernsey from 1563 to 1569 (witchcraft, piratry, infanticide etc). The text was transcribed in full from the original manuscript Guernsey Greffe Crime I, abbreviations were expanded. In the treebank, sentences from this text have the prefix 1563-1569_Guern.
1578_Terrien
This text contains passages authored by Guillaume Terrien himself (and not quotations from earlier legal texts) from Book 9 “Style de procédure” from the sixteenth-century printed book Guillaume Terrien (1568). Commentaires du droict civil tant public que privé observé au pays et duché de Normandie, 2nd edition, Paris: Jacques du Puy, pp. 339-402. The spelling and word segmentation of the original, including abbreviated words (e.g. “glo.” for “glose”), have been retained. Only abbreviations for “m” and “n” (eg. “o with a tilda” for “om” or “on” and “&” for “et” have been expanded. In the treebank, sentences from this text have the prefix 1578_Terrien.
Sentences written completely in Latin were excluded. If Latin words occur in French sentences, the token contains the tag Lang=la and is lemmatised with a Latin lemma.
Sentence and token number per text
| Text | Sentences | Tokens |
|---|---|---|
| 1563-1569_Guern | 1,269 | 45,101 |
| 1578_Terrien | 757 | 25,113 |
| Total | 2,026 | 70,114 |
Annotation
Verbs and auxiliaries are annotated in verb forms (VerbForm): Inf (infinitive), Fin (conjugated) and Part (participle). In 1568_Terrien, congujated verbs and auxiliaries are annotated in Person and Number.
Pronouns are annotated in type (PronType: Dem for demonstrative, Ind for indefinite, Int for interrogative, Prs for personal and Rel for relative). Reflexive and possessive pronouns are also tagged (Reflexive=Yes and Poss=Yes).
Determiners are annotated using PronType feature (Art for articles, Dem for demonstratives, Ind for indefinite). Possessive determiners have are annotated Poss=Yes.
The treebank is lemmatised using modern French lemmata and, wherever approriate, using lemmata from (Dictionnaire du Moyen Français).
Train/Dev/Test split
| Set | Sentences | Tokens |
|---|---|---|
| Train | 1202 | 43,389 |
| Dev | 154 | 6,024 |
| Test | 670 | 20,701 |
| Total | 2,026 | 70,114 |
Earlier versions of the texts, annotated with HT-CRISCO workflow incorporating the use of HOPS parser, can be consulted on CRISCO Lab’s TXM server and via the website.
Please note that French-ALTS treebank is still under development and will be undergoing campains of correction. Annotation will be revised and expanded. Please do not hesitate to contact us is you have any questions, suggestions or comments.
Acknowledgments
This work was made possible thanks to the generous support of the ANR-DFG Franco-German scheme (MICLE project (2021-2024)) and of the Normandy region AUTOMATED project (2023-2025). The projects were led by Professor Pierre Larrivée at the University of Caen.
1563-1569_Guern
We thank the staff at the Guernsey Greffe archives and the Guernsey Museum & Art Gallery for giving us acces to the original manuscript and digital images in 2021 and 2023 which. We are also grateful to former island archivist Daryl Ogier for his assistance and advice when working with the original source. We are grateful to the team of student transcribers (Agathe Aubert, Lucie Marie-Leblanc, Marie Picart and Valentin Simenel) who helped with the transcription in 2022. We thank Patrice Lajoye and Stéphane Laîné for their assistance with lemmatisation and dialectal features of the text and to Mattis Le Squer who helped elucidate the historical context of the document. The annotation of 1563-1569_Guern has not been revised since UD 2.16 release. Annotation was performed by Natasha Romanova and Rayan Ziane, technical assistance by Khensa Daoudi.
**1578Terrien** The digitisation of Guillaume Terrien’s _Commentaires du droict civil tant public que privé observé au pays et duché de Normandie was originally performed by Morgane Pica and Mathieu Goux as part of the ConDE project funded by Normandy region. PoS annotation and lemmatisation was performed by Natasha Romanova. Annotation in syntactic functions was done by Théo Brillet and Natasha Romanova. Théo Brillet annotated all the sentences with Latin tokens. Khensa Daoudi and Rayan Ziane provided technical assistance.
References
- Ziane, Rayan & Romanova, Natasha, 2024. « Pistes pour l’optimisation de modèles de parsing syntaxique » Proceedings of LIFT 2 - 2024 : Journées de lancement. 14-15 Nov 2024, Orléans, France. https://lift2-2024.sciencesconf.org/590561/document (7 pp.)
See also:
- **Daoudi, Khensa, Dehouck, Mathieu, Romanova, Natasha & Ziane, Rayan, 2025. « Explicit Edge Length Coding to Improve Long Sentence Parsing Performance ». Proceedings of the First Workshops on Advancing NLP for Low-Resource Languages. 13 September 2025, Varna, Bulgaria. URL: https://acl-bg.org/proceedings/2025/LowResNLP%202025/index.html (pp. 102-110)
Statistics of UD French ALTS
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Definite – ExtPos – Number – NumType – Person – Polarity – Poss – PronType – Tense – VerbForm
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – aux:pass – case – cc – ccomp – conj – cop – csubj – csubj:outer – det – discourse – dislocated – expl – fixed – flat – iobj – mark – nmod – nsubj – nummod – obj – obl – orphan – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 2026 sentences, 66817 tokens and 68088 syntactic words.
- This corpus contains 7073 tokens (11%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 268 types of words that contain both letters and punctuation. Examples: l', d', qu', n', s', c', .iii., .vi., .iiii., .viii., l., .ii., .v., c., .i., .vcclxvi., .xx., .vii., ff., .xv., .xxx., .iiiixx., .ix., .vcclxiii., .vcclxv., .xxv., .vcclxvii., .xxxv., .vcclxix., .xvi., .xxiiii., d'auantage, m', .ixe., .l., .x., .xvii., .xxviiie., glo., ordon., tit., .vcclxviii., .viie., .xiiii., .xve., .xxi., .xxixe., .xxvi., .xxvie., .xxxe.
- This corpus contains 1271 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 38 types of multi-word tokens. Examples: du, au, dudit, des, audit, aux, desditz, es, auquell, aulx, és, auquel, desdits, duquel, desditez, esquels, desdites, desquels, desquelz, aulxditz, dez, aulxquelz, ausdits, ausquels, duquell, esditez, esdits, Esdites, aulxditez, auquelz, ausdites, dedits, desdicts, desquelles, dezditz, esquelles, ipsumque, ès.
Morphology
Tags
- This corpus uses 14 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: PART, SYM, X
- This corpus contains 50 lemmas tagged as pronouns (PRON): aucun, aultre, autre, autrui, ce, ceci, cela, celui, chacun, cil, dont, en, hic, icelui, il, is, je, le, lequel, meus, nous, nul, néant, omnis, on, où, personne, plusieurs, que, qui, quiconque, quicumque, quilibet, quis, quivis, quoi, rien, se, si, sien, soi, te, tel, tout, tu, tuus, un, ung, vous, y
- This corpus contains 45 lemmas tagged as determiners (DET): aucun, autre, ce, cedit, certain, chacun, cil, cist, de, du, icelui, idem, il, ille, ipse, is, ladit, le, ledit, lequel, les, lesdit, leur, leurdit, maint, mon, noster, notre, notredit, nul, plusieurs, quant, quel, quelconque, quelque, qui, son, sondit, suus, tel, ton, tout, un, votre, votredit
- Out of the above, 16 lemmas occurred sometimes as PRON and sometimes as DET: aucun, autre, ce, chacun, cil, icelui, il, is, le, lequel, nul, plusieurs, qui, tel, tout, un
- This corpus contains 4 lemmas tagged as auxiliaries (AUX): avoir, faire, sum, être
- Out of the above, 4 lemmas occurred sometimes as AUX and sometimes as VERB: avoir, faire, sum, être
- There are 3 (de)verbal forms:
- Fin
- AUX: a, est, avoet, fut, estoet, sont, ont, estoient, avoient, furent
- VERB: dit, raporte, a, dyst, avoet, estoet, vynt, vyndrent, use, confesse
- Inf
- AUX: estre, avoir, auoir, faire, etre, a, avoer, est, este, esté
- VERB: dire, faire, croire, cuyder, aller, prouuer, avoir, bailler, demander, venir
- Part
- ADJ: accoustumee, assemblees, denié, escrits, examinez, faite, iurez, perie, prouué, signee
- AUX: estey, esté, faict, ayans, ayant, estant, faisant, fait, estoit, estans
- VERB: ouy, passey, pryntz, veu, desrobey, eu, dit, faict, prestey, faicte
Nominal Features
- Plur
- ADJ: peremptoires, tenans
- AUX-Fin: sont, soyent, ont, seront, seroyent, auoient, estoyent, auront, ayent, estoient
- NOUN: ans, parties, tesmoins, faicts, jurés, jour, jours, officiers, lettres, sepmainnes
- PRON: siens
- PROPN: Collas, Johan, le, Thomas, de, Martin, Nicollas, du, Port, Bequet
- VERB: peuuent, voyez, doiuent, doyuent, notez, demeurent, font, ont, sont, veulent
- VERB-Fin: peuuent, voyez, doiuent, doyuent, notez, demeurent, font, ont, sont, veulent
- Sing
- ADJ: nouueau, present, accessoire, commune, escrit, resseant, vile
- ADP: iouxte
- ADV: apertement, bien, depuis, mal
- AUX-Fin: est, a, seroit, soit, sera, estoit, fut, auoit, ait, fust
- NOUN: cause, partie, iuge, droict, preuue, iour, defaut, demandeur, defendeur, cas
- PRON: neant
- PROPN: Papon, Paris, Imbert, France, du, Normandie, Heulte, Iean, Mesnil, Noyer
- VERB: peut, a, doit, est, faut, dit, pourroit, fait, veut, appelle
- VERB-Fin: peut, a, doit, est, faut, dit, pourroit, fait, veut, appelle
- Def
- DET: le, la, l', les, ledit, ladite, lesdits, lesdites, lez, ung
- Ind
- ADJ: tel, telle, certain, pareil, telles
- DET: ledit, ung, ladite, une, du, des, lesditz, plusieurs, de, lesditez
- PRON: tel
Degree and Polarity
- Neg
- ADV: ne, n', Non
Verbal Features
- Past
- ADJ-Part: accoustumee, assemblees, denié, escrits, examinez, faite, iurez, perie, prouué, signee
- AUX-Part: esté, fait
- VERB-Part: dit, receu, fait, tenu, iugé, faite, donné, examinez, mis, adiourné
- Pres
- AUX-Part: ayans, ayant, estant, faisant, estans
- VERB-Part: faisant, appelant, ayant, disant, parlant, affermant, contenant, defendant, demandant, donnant
Pronouns, Determiners, Quantifiers
- Art
- DET: le, les, la, l', ledit, vne, vn, un, ladite, lesdits
- Dem
- DET: ce, ceste, ces, cest, iceluy, icelle, iceux, ses, cedit
- PRON: ce, c', cela, celuy, ceux, ceulx, iceluy, icelle, yceulx, cecy
- Ind
- ADJ: autre, autres
- DET: plusieurs, certain, chacun, quelconque, tout, toutes
- PRON: tout, ung, aultre, autre, une, rien, un, vne, aultres, autres
- Int
- DET: quel, quelle, quels
- PRON: que, qu', qui, quy
- Prs
- PRON: il, luy, elle, en, ilz, y, se, on, s', le
- Rel
- DET: lequel, laquelle, lesquels, lesquelles
- PRON: qui, que, ou, lequell, laquelle, qu', lequel, dont, don, lesquelz
- Card
- ADJ: 13., 16., 20., 21., 8.
- NUM: deulx, mille, troys, .iii., .vi., deux, quatre, chinq, .iiii., .viii.
- Ord
- ADJ: premier, premiere, second, .ixe., .xxviiie., .viie., .xve., .xxixe., .xxxe., .ve.
- Yes
- ADJ: seing
- DET: sa, son, ses, leurs, leur, nostre, vostre, leursdits, nos
- PRON: siens
- 1
- AUX-Fin: ay, aye, suis
- VERB-Fin: defens, adaptons, afferme, allegue, appellons, bannissons, croy, denie, deuo[m]s, disons
- 2
- VERB-Fin: voyez, notez, Entendez, ioignez, ouez
- 3
- AUX-Fin: est, sont, a, seroit, soit, sera, estoit, fut, soyent, ont
- VERB-Fin: peut, a, doit, est, faut, dit, pourroit, fait, veut, appelle
Other Features
- ExtPos
- ADJ
- ADP: de, a, d'
- ADP
- ADP: juscque, quant, a, iusques, par, afin, Qua[n]t, auant, d', usque
- ADV: hors, quant, afin, auprés, pres, auant, fors, lors, affin, ainsy
- PRON: il, yl
- SCONJ: Quant
- ADV
- ADP: En, IVsques, de, pour, à
- ADV: ainsy, fors, non, tant, à
- DET: ung
- PRON: c', id, hoc, Qui
- SCONJ: que, vt
- VERB: scaver, sçavoir, peut, sçaver
- VERB-Inf: scaver, sçavoir, sçaver
- CCONJ
- CCONJ: ou, nec, verum
- DET
- DET: de, l', la
- PRON
- PRON: ce
- PROPN
- PROPN: Charles, sainct
- SCONJ
- ADP: pour, en, iusques, juscque, a, par, de, apres, sans, d'
- ADV: combien, aprés, encores, dempuys, ainsy, alhors, ainsi, tellement, auant, afin
- PRON: quoy
- SCONJ: pourueu, sinon, comme, parce, pourquoy, que
- VERB: veu, considerant, consideré, entendu
- VERB-Part: veu, consideré, entendu
- VERB
- VERB-Part: voyant
- ADJ
Syntax
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: être, sum.
- This corpus uses 4 lemmas as auxiliaries (aux). Examples: avoir, être, faire, sum.
- This corpus uses 3 lemmas as passive auxiliaries (aux:pass). Examples: être, avoir, sum.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (43)
- VERB--PRON (24)
- VERB-Fin--NOUN (597)
- VERB-Fin--NOUN-ADP(après) (1)
- VERB-Fin--PRON (1666)
- VERB-Fin--PRON-ADP(contre) (1)
- VERB-Fin--PRON-ADP(à) (1)
- VERB-Inf--NOUN (4)
- VERB-Inf--PRON (30)
- VERB-Part--NOUN (443)
- VERB-Part--NOUN-ADP(après) (1)
- VERB-Part--PRON (683)
- obj
- VERB--NOUN (89)
- VERB--NOUN-ADP(de) (1)
- VERB--PRON (32)
- VERB-Fin--NOUN (828)
- VERB-Fin--NOUN-ADP(avec) (1)
- VERB-Fin--NOUN-ADP(de) (17)
- VERB-Fin--NOUN-ADP(environ) (1)
- VERB-Fin--NOUN-ADP(par) (1)
- VERB-Fin--NOUN-ADP(touchant) (4)
- VERB-Fin--PRON (387)
- VERB-Fin--PRON-ADP(de) (1)
- VERB-Inf--NOUN (480)
- VERB-Inf--NOUN-ADP(de) (4)
- VERB-Inf--NOUN-ADP(par)-ADP(devers) (1)
- VERB-Inf--NOUN-ADP(suivant) (1)
- VERB-Inf--PRON (183)
- VERB-Part--NOUN (260)
- VERB-Part--NOUN-ADP(de) (4)
- VERB-Part--NOUN-ADP(in) (1)
- VERB-Part--NOUN-ADP(jusque) (1)
- VERB-Part--NOUN-ADP(par) (2)
- VERB-Part--PRON (156)
- VERB-Part--PRON-ADP(de) (1)
- iobj
- VERB--PRON (35)
- VERB-Fin--PRON (254)
- VERB-Inf--PRON (43)
- VERB-Part--PRON (136)
- VERB-Part--PRON-ADP(par) (2)
- VERB-Part--PRON-ADP(à) (2)
Relations Overview
- This corpus uses 3 relation subtypes: acl:relcl, aux:pass, csubj:outer
- The following 6 relation types are not used in this corpus at all: clf, compound, list, goeswith, reparandum, dep