home edit page issue tracker

This page pertains to UD version 2.

UD for Middle French

Tokenization and Word Segmentation

Middle French tokenization is mostly based on whitespaces and punctuation. Some work is still needed for a complete analysis of fused forms such as “dudit” = “de ledit” (ADP+DET) along the UD guidelines.

Instruction: Describe the general rules for delimiting words (for example, based on whitespace and punctuation) and exceptions to these rules. Specify whether words with spaces and/or multiword tokens occur. Include links to further language-specific documentation if available.



Instruction: Specify any unused tags. Explain what words are tagged as PART. Describe how the AUX-VERB and DET-PRON distinctions are drawn, and specify whether there are (de)verbal forms tagged as ADJ, ADV or NOUN. Include links to language-specific tag definitions if any.


Instruction: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.


Instruction: Give criteria for identifying core arguments (subjects and objects), and describe the range of copula constructions in nonverbal clauses. List all subtype relations used. Include links to language-specific relations definitions if any.


There are 1 Middle French UD treebanks:

Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and from the data in the latest release. Link to the respective *-index.html page in the treebanks folder, using the language code and the treebank code in the file name.