UD French ALTS
Language: French (code: fr
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.16 release.
The following people have contributed to making this treebank part of UD: Natalia Romanova, Rayan Ziane, Khensa Daoudi.
Repository: UD_French-ALTS
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.16
License: CC BY-SA 4.0
Genre: legal
Questions, comments? General annotation questions (either French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [natalia • romanova (æt) unicaen • fr]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
ALTS (AUTOMATED Sixteenth-century corpus) is a treebank of sixteenth-century legal French. Currently in contains one text, trial accounts from Guernsey Greffe (register Crime I), transcribed directly from the manuscript and manually annotated in PoS, lemmata and syntactic functions. The text presents dialectal Norman features and forms.
This text of Guernsey Crime I (1,269 sentences; 45,101 tokens) which contains accounts of fifteen court cases on the island on Guernsey from 1563 to 1569 (witchcraft, piratry, infanticide etc) was first annotated in PoS, lemmatised and automatically parsed as part of the Franco-German MICLE project (2021-2024) led by Professor Pierre Larrivée (University of Caen) and Professor Cecilia Poletto (University of Frankfurt). Earlier versions of the text, annotated with HT-CRISCO workflow incorporating the use of HOPS parser, can be consulted on CRISCO Lab’s TXM server and via the website.
As part of AUTOMATED project the text was reannotated with BertForDeprel parser and manually corrected using bootstrapping methodology (Peng et al 2022) on ArboratorGrew software.
Set | Sentences | Tokens |
---|---|---|
Train | 811 | 30,140 |
Dev | 111 | 4,575 |
Test | 347 | 10,386 |
Total | 1,269 | 45,101 |
Acknowledgments
This work was made possible thanks to the generous support of the ANR-DFG Franco-German scheme (MICLE project (2021-2024)) and of the Normandy region AUTOMATED project (2023-2025).
We would like to thank the staff at the Guernsey Greffe archives and the Guernsey Museum & Art Gallery for giving us acces to the manuscript and digital images in 2021 and 2023. We are also grateful to former island archivist Daryl Ogier for his assistance and advice when working with the original source. We are grateful to the team of student transcribers (Agathe Aubert, Lucie Marie-Leblanc, Marie Picart and Valentin Simenel) who helped with the transcription in 2022. We thank Patrice Lajoye and Stéphane Laîné for their assistance with lemmatisation and dialectal features of the text and to Mattis Le Scaer who helped elucidate the historical context of the document.
Annotation was performed by Natasha Romanova and Rayan Ziane, technical assistance by Khensa Daoudi.
References
- (Ziane & Romanova 2024) Pistes pour l’optimisation de modèles de parsing syntaxique. LIFT 2 - 2024 : Journées de lancement, Nov 2024, Orléans, France.
Statistics of UD French ALTS
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Definite – ExtPos – Number – NumType – Polarity – PronType – VerbForm
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – aux:pass – case – cc – ccomp – conj – cop – csubj – det – discourse – dislocated – expl – fixed – flat – iobj – mark – nmod – nsubj – nummod – obj – obl – orphan – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 1269 sentences, 43088 tokens and 43832 syntactic words.
- This corpus contains 4624 tokens (11%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 99 types of words that contain both letters and punctuation. Examples: l', d', s', n', qu', .iii., .vi., c', .iiii., .viii., .ii., .v., .vcclxvi., .xx., .i., .vii., .xv., .xxx., .iiiixx., .ix., .vcclxiii., .vcclxv., .xxv., .vcclxvii., .xxxv., .vcclxix., .xvi., .xxiiii., m', .ixe., .x., .xvii., .xxviiie., .l., .vcclxviii., .viie., .xiiii., .xve., .xxi., .xxixe., .xxvi., .xxvie., .xxxe., .lx., .ve., .vie., .xiie., .xiiie., .xixe., .xl.
- This corpus contains 744 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 22 types of multi-word tokens. Examples: dudit, du, au, audit, des, desditz, es, auquell, aulx, desditez, desquelz, aulxditz, dez, aulxquelz, aux, duquell, esditez, aulxditez, auquelz, desquelles, dezditz, duquel.
Morphology
Tags
- This corpus uses 14 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: PART, SYM, X
- This corpus contains 35 lemmas tagged as pronouns (PRON): aucun, aultre, autre, autrui, ce, ceci, cela, celui, cil, dont, en, icelui, il, je, le, lequel, nous, on, où, personne, plusieurs, que, qui, quiconque, quoi, rien, se, sien, te, tout, tu, un, ung, vous, y
- This corpus contains 33 lemmas tagged as determiners (DET): aucun, ce, cedit, certain, cil, cist, de, du, icelui, il, ladit, le, ledit, lequel, les, leur, leurdit, maint, mon, notre, notredit, nul, plusieurs, quel, quelque, son, sondit, tel, ton, tout, un, votre, votredit
- Out of the above, 10 lemmas occurred sometimes as PRON and sometimes as DET: aucun, ce, cil, icelui, il, le, lequel, plusieurs, tout, un
- This corpus contains 3 lemmas tagged as auxiliaries (AUX): avoir, faire, être
- Out of the above, 3 lemmas occurred sometimes as AUX and sometimes as VERB: avoir, faire, être
- There are 3 (de)verbal forms:
- Fin
- AUX: a, avoet, fut, estoet, est, estoient, ont, avoient, furent, sont
- VERB: dit, raporte, dyst, avoet, a, estoet, vynt, vyndrent, use, confesse
- Inf
- AUX: avoir, estre, faire, avoer
- VERB: dire, croire, cuyder, aller, faire, avoir, queryr, mettre, venir, emporter
- Part
- AUX: estey, faict, esté
- VERB: ouy, passey, pryntz, veu, desrobey, eu, faict, prestey, confessé, estey
Nominal Features
- Plur
- NOUN: ans, jurés, jour, jours, officiers, sepmainnes, foys, homes, navires, bestes
- PROPN: Collas, Johan, le, Thomas, de, Martin, Nicollas, du, Port, Bequet
- Def
- DET: le, la, l', les, ledit, lez, ung, dé
- Ind
- DET: ledit, ung, ladite, une, du, des, lesditz, de, lesditez, plusieurs
Degree and Polarity
- Neg
- ADV: ne, n', Non
Verbal Features
Pronouns, Determiners, Quantifiers
- Dem
- PRON: ce, cela, c', ceulx, yceulx, ceux, ycelle, celluy, ycelluy, Se
- Ind
- PRON: tout, ung, aultre, une, rien, aultres, aultruy, aulcuns, personne, tous
- Int
- PRON: que, qu', qui, quy
- Prs
- PRON: il, luy, elle, ilz, en, y, s', le, se, nous
- Rel
- PRON: que, ou, qui, lequell, laquelle, don, lesquelz, lequel, qu', lesquelles
- Card
- NUM: deulx, mille, troys, .iii., .vi., chinq, .iiii., .viii., .ii., .v.
- Ord
- ADJ: premier, .ixe., .xxviiie., .viie., .xve., .xxixe., .xxxe., .ve., .vie., .xiie.
Other Features
- ExtPos
- ADP
- ADP: juscque, a, quant, par, d'
- ADV: hors, auprés, pres, affin, ainsy, aprés, dehors, fors
- PRON: il, yl
- ADV
- ADP: En
- ADV: ainsy, fors
- DET: ung
- PRON: c'
- VERB-Inf: scaver, sçavoir, sçaver
- DET
- DET: de, l', la
- PRON
- PRON: ce
- SCONJ
- ADP: en, juscque, a, pour, aprés, par, de, reservey, reservé, sans
- ADV: aprés, dempuys, ainsy, alhors, devant, aussytost, fors, nonobstant, tellement, auttant
- SCONJ: que
- VERB: considerant, consideré, entendu, veu
- VERB-Part: consideré, entendu, veu
- ADP
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: être.
- This corpus uses 3 lemmas as auxiliaries (aux). Examples: avoir, être, faire.
- This corpus uses 2 lemmas as passive auxiliaries (aux:pass). Examples: être, avoir.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (8)
- VERB--PRON (7)
- VERB-Fin--NOUN (294)
- VERB-Fin--NOUN-ADP(après) (1)
- VERB-Fin--PRON (1174)
- VERB-Fin--PRON-ADP(à) (1)
- VERB-Inf--NOUN (1)
- VERB-Inf--PRON (19)
- VERB-Part--NOUN (191)
- VERB-Part--NOUN-ADP(après) (1)
- VERB-Part--PRON (470)
- obj
- VERB--NOUN (49)
- VERB--NOUN-ADP(de) (1)
- VERB--PRON (21)
- VERB-Fin--NOUN (565)
- VERB-Fin--NOUN-ADP(avec) (1)
- VERB-Fin--NOUN-ADP(de) (13)
- VERB-Fin--NOUN-ADP(environ) (1)
- VERB-Fin--NOUN-ADP(par) (1)
- VERB-Fin--NOUN-ADP(touchant) (4)
- VERB-Fin--PRON (293)
- VERB-Fin--PRON-ADP(de) (1)
- VERB-Inf--NOUN (171)
- VERB-Inf--NOUN-ADP(de) (2)
- VERB-Inf--PRON (95)
- VERB-Part--NOUN (161)
- VERB-Part--NOUN-ADP(de) (4)
- VERB-Part--NOUN-ADP(jusque) (1)
- VERB-Part--NOUN-ADP(par) (2)
- VERB-Part--PRON (126)
- iobj
- VERB--PRON (33)
- VERB-Fin--PRON (247)
- VERB-Inf--PRON (36)
- VERB-Part--PRON (118)
- VERB-Part--PRON-ADP(par) (1)
- VERB-Part--PRON-ADP(à) (1)