home edit page issue tracker

This page pertains to UD version 2.

UD French GSD

Language: French (code: fr)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v1.0 release.

The following people have contributed to making this treebank part of UD: Marie-Catherine de Marneffe, Bruno Guillaume, Ryan McDonald, Alane Suhr, Joakim Nivre, Matias Grioni, Carly Dickerson, Guy Perrier.

Repository: UD_French-GSD
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-SA 4.0

Genre: blog, news, reviews, wiki

Questions, comments? General annotation questions (either French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [demarneffe • 1 (æt) osu • edu, bruno • guillaume (æt) inria • fr]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas assigned by a program, with some manual corrections, but not a full manual verification
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS not available
Features assigned by a program, with some manual corrections, but not a full manual verification
Relations annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion

Description

The UD_French-GSD was converted in 2015 from the content head version of the universal dependency treebank v2.0 (https://github.com/ryanmcd/uni-dep-tb). It is updated since 2015 independently from the previous source.

The UD_French-GSD is converted from the content head version of the universal dependency treebank v2.0 (https://github.com/ryanmcd/uni-dep-tb). The README for the original project is available below.

The version 2.7 of UD_French-GSD data consists of 400,399 words (16,341 sentences). No sentence id were available in the original resource, so new sent_id were automatically introduced in the converted corpus with prefixes fr-ud-train, fr-ud-dev and fr-ud-test on the corresponding original files, followed by a 5 digit number following the order of sentences in the original files.

:warning: to meet the size requirements of test data of 10K words, a part of the dev original file was moved to the test file! Since version 2.0, the splitting of data is:

Sentences are shuffled and there is no way to know what is the source or the genre of a given sentence.

Probably due to some bug in a conversion program, version 1.2 contains many truncated sentences (date missing for instance). Almost every truncated sentence are from Wikipedia, so it was possible to recover the original text. Most of the truncated sentences were completed in version 1.3. Some sentences were completed later. There are probably still some truncated sentences.

Acknowledgments

The latest version of the corpus was produced by Marie-Catherine de Marneffe, Bruno Guillaume, Matias Grioni, Carly Dickerson and Guy Perrier. Automatic modifications and consistency checking were partly done using the Grew software.

See below for references and acknowledgments concerning the original corpus.

Statistics of UD French GSD

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPRONPROPNPUNCTSCONJSYMVERBX

Features

DefiniteEmphExtPosForeignGenderMoodNumberNumber[psor]NumTypePersonPerson[psor]PolarityPossPronTypeReflexTenseTypoVerbFormVoice

Relations

aclacl:relcladvcladvcl:cleftadvmodamodapposauxaux:causaux:passaux:tensecaseccccompcompoundconjcopcsubjcsubj:passdepdep:compdetdiscoursedislocatedexpl:compexpl:passexpl:pvexpl:subjfixedflatflat:foreignflat:namegoeswithiobjiobj:agentmarknmodnsubjnsubj:causnsubj:outernsubj:passnummodobjobj:agentobj:lvcoblobl:agentobl:argobl:modorphanparataxisparataxis:insertpunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Reflexive Passive

Verbs with Reflexive Core Objects

Relations Overview