Multiword Expressions in UD v2
Since UD does not allow “words with spaces” (but see a partly new proposal under word segmentation), even completely fixed multiword expressions must be annotated with (dummy) dependency relations. To improve annotation consistency, we propose the following changes for v2:
- Rename u-dep/mwe to fixed and make clear that this should only be used for completely fixed expressions
- Change the direction of arrows (right-to-left instead of left-to-right) for this relation as well as the other non-dependency relations u-dep/name and u-dep/foreign (see also semantic categories)
Rename mwe to fixed
It seems that the label mwe
(multiword expression) has led to a lot of confusion. It was never intended for multiword expressions like “kick the bucket”, or Fr. “pomme de terre” (potato). It has always been restricted to the fixed expressions category of
Sag et al., excluding any relations in scope of u-dep/name or u-dep/compound. The label fixed
might reflect this fact better.
The proposed change is therefore to change the label mwe
(multiword expression) to fixed
, and making the guidelines more restrictive (namely, fixed is used only for completely fixed grammaticized expressions that behave like function words or short adverbials).
Change arrow direction
For non-dependency relations, it was (more or less) arbitrarily decided in v1 to draw arrows from left to right out of the first word. With hindsight, a more harmonious choice for most languages would have been to instead draw arrows from right to left out of the last word. We propose to make this change for fixed
(currently u-dep/mwe), u-dep/foreign and flat
(currently u-dep/name). Examples:
I like dogs as well as cats
fixed(as-6, well-5)
fixed(as-6, as-4)
He cried because of you
fixed(of, because)
Je préfère prendre un dessert plutôt qu' une entrée \n I prefer getting a dessert rather than an appetizer
fixed(qu', plutôt)
She said : ez esan lasai
parataxis(said, lasai)
foreign(lasai, esan)
foreign(lasai, ez)
Usain Bolt won the race
nsubj(won, Bolt)
flat(Bolt, Usain)