home edit page issue tracker

This page pertains to UD version 2.

UD French ParisStories

Language: French (code: fr)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v2.9 release.

The following people have contributed to making this treebank part of UD: Kim Gerdes, Sylvain Kahane, Menel Mahamdi.

Repository: UD_French-ParisStories
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.9

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [gerdes (æt) lisn • fr]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS not available
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

Paris Stories is a corpus of oral French collected and transcribed by Linguistics students from Sorbonne Nouvelle and corrected by students from the Plurital Master’s Degree of Computational Linguistics ( Inalco, Paris Nanterre, Sorbonne Nouvelle) between 2017 and 2021. It contains monologues and dialogues from speakers living in the Parisian region.

For an assignment, students had to record a friend or a relative sharing an anecdote about a given theme (meaningful encounters, vacations, interesting stories..). The corpus was created for the study of contemporary spoken French and to train a syntactic parser for spoken French. All data has been morpho-syntactically annotated following the SUD (Surface Syntactic Universal Dependencies) guidelines.

See SUD Guidelines : https://surfacesyntacticud.github.io/guidelines/u/

The Treebank can be found here : http://match.grew.fr/?corpus=SUD_French-ParisStories@latest

The recordings can be downloaded via the url given in the ‘# sound_url’ metadata.

Description

– Paris Stories 2019 –

Creation Year : 2017

Annotation Year : 2019

Size :

Topics : travels, funny/unusual stories

– Paris Stories 2020 –

Creation Year : 2018

Annotation Year : 2020

Size :

Topics : vacation stories, funny/unusual stories

– Paris Stories 2021 –

Creation Year : 2020

Annotation Year : 2021

Size :

Topics : first encounters, funny/unusual stories

Development

The corpus is maintained here in the SUD framework and automatically converter into UD using the Grew software with the conversions rules described here.

Data Split

The file fr_parisstories-ud-test.conllu contains the following data:

The file fr_parisstories-ud-train.conllu contains the following data:

Acknowledgments

Annotation : Sylvain Kahane, Bruno Guillaume, Mariam Nakhlé, Vanessa Gaudray-Bouju, Menel Mahamdi

Annotation tools development : Kim Gerdes, Marine Courtin, Gaël Guibon

Conversion and handling of data validation : Bruno Guillaume

Direction of data collection : Cédric Gendrot, Kim Gerdes, Marine Courtin

We would like to thank all the students who participated in this project.

References

An article about the annotation of spoken French will soon be released (Kahane et al. 2021)

Statistics of UD French ParisStories

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPRONPROPNPUNCTSCONJVERBX

Features

DefiniteForeignGenderMoodNumberNumber[psor]PersonPerson[psor]PolarityPronTypeReflexTenseTypoVerbForm

Relations

aclacl:relcladvcladvcl:cleftadvmodamodapposauxaux:causaux:passaux:tensecaseccccompcompoundconjcopcsubjdepdep:compdetdiscoursedislocatedexplexpl:subjfixedflatflat:namegoeswithiobjmarknmodnmod:apposnsubjnsubj:causnsubj:passnummodobjobj:lvcoblobl:argobl:modorphanparataxisparataxis:parenthpunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview