Multiword Expressions in UD v2
Since UD does not allow “words with spaces” (but see a partly new proposal under word segmentation), even completely fixed multiword expressions must be annotated with (dummy) dependency relations. To improve annotation consistency, we propose the following change for v2:
- Rename
mwe
to u-dep/fixed and make clear that this should only be used for completely fixed expressions - Rename
name
to u-dep/flat and extend its use to semi-fixed multiword expressions that do not have a clear syntactic head. (See semantic categories.) - Remove
foreign
and subsume its use under the new relation u-dep/flat. - Add a new subtype of u-dep/compound for handling serial verb constructions in analogy with particle verbs and grammticized light verb constructions.
Rename mwe to fixed
It seems that the label mwe
(multiword expression) has led to a lot of confusion. It was never intended for multiword expressions like “kick the bucket”, or Fr. “pomme de terre” (potato). It has always been restricted to the fixed expressions category of
Sag et al., excluding any relations in scope of name
or u-dep/compound. The label u-dep/fixed reflects this fact better.
The proposed change is therefore to change the label mwe
(multiword expression) to u-dep/fixed, and making the guidelines more restrictive (namely, fixed
is used only for completely fixed grammaticized expressions that behave like function words or short adverbials).
Remove foreign
The foreign
relation does not denote a proper dependency relation and it now seems appropriate to subsume it under the new generalized flat
relation. Note that there was never a special part-of-speech tag for foreign words, which were tagged X
in cases where they could not be given a proper tag. The use of flat
in syntax can be seen as parallel to this. A subtype flat:foreign
can be used to preserve information in existing treebanks.
Serial verb constructions
Serial verb constructions are typologically important and inadequately covered by UD v1. In the absence of a deeper analysis of this class of expressions, which may be worth a special universal relation, we propose to treat them as a subtype of compound and use compound:svc
in analogy with the existing subtypes compound:prt
for particle verbs and compound:lvc
for grammaticized light verb constructions.