home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD for Breton

Tokenisation and Word Segmentation

Tokenisation was originally done using the Apertium morphological analyser for Breton. This joins certain multiword tokens with spaces as single tokens. Where the number of spaces in the original token matches the number of spaces in the multiword token, these are split into separate tokens in UD, where the part of speech of the multiword token is given to the first token, and subsequent tokens are given the part of speech X and attached with the fixed relation.

The most important tokenisation factor is with the words traditionally described as inflected or conjugated prepositions. Here we analyse them as contractions of prepositions and pronouns. For example, dit is tokenised as a multiword token constructed from da “to” and it “you”.

Morphology

Features

Inherent:

Inflectional:

Aspect
Degree
Gender (other than NOUNs)
Mood
Number
Tense
VerbForm

Syntax

The following relation subtypes are used in the Breton data:

acl:relcl Relative adnominal clause
aux:pass Auxiliary verb used in the construction of the passive
flat:name Parts of a multi-word personal name attached to the first part
nmod:gen Nominal modification with genitive meaning using Celtic-style conjunctive genitive
nmod:poss Nominal modification with possessive meaning
nsubj:cop Nominal subject of a non-verbal or copular clause
obl:agent An oblique introduced with gant “with” that expresses the agent in a passive construction

Treebanks

There is 1 Breton UD treebank: