home edit page issue tracker

This page pertains to UD version 2.

UD for French

In version 2.17, the French language is composed of nine treebanks but two of them don’t contain modern French:

The description below is relative to the seven modern French corpora.

Tokenization and Word Segmentation

For more details, see tokenization.

Morphology

Tags

This is an overview only. For more detailed discussion and examples, see the list of French POS tags and French features.

French uses all 17 universal POS categories:

Nominal Features

Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Note that since version 2.17, the four treebanks built from SUD (GSD, Sequoia, ParisStories and Rhapsodie) use a more detailed feature system:

See French features) for links.

Syntax

This is an overview only. For more detailed discussion and examples, see the list of French relations.

Core Arguments, Oblique Arguments and Adjuncts

Relations Overview

The following relation subtypes are used in French:

Corpus FQB GSD ParisStories ParTUT PUD Rhapsodie Sequoia
acl:relcl 77 3240 310 301 227 507 520
advcl:cleft 17 212 40     78 20
aux:caus 3 250 16 13 9 27 34
aux:pass 247 3401 105 241 226 134 759
aux:tense 503 3837 1012   568 492 948
csubj:pass   26   1 2 1 4
dep:comp   15 27     40 5
expl:comp 176 211 298   28 293 44
expl:pass   687 23     33 57
expl:pv   1017 49   2   242
expl:subj 333 931 314   83 425 237
flat:foreign 131 1075   3 113 6 136
flat:name 581 7005 31 61 252 161 807
iobj:agent   24 1 1     1
nmod:appos     4     121  
nsubj:caus 1 132 4 4 4 14 16
nsubj:outer   23 23     14 3
nsubj:pass 240 3666 41 224 200 123 620
obj:agent   111 3 9 4   12
obj:lvc   554 84     68 2
obl:agent 30 1554 2 69 1 3 281
obl:arg 570 8670 508   80 812 1608
obl:mod 611 15927 1057   81 1118 2392
parataxis:insert   183       15 126
parataxis:parenth     27     39  

Treebanks

There are nine French UD treebanks:

Note that the UD_French-FTB was now retired because it was not updated to follow the latest validation contraints.