home edit page issue tracker

This page still pertains to UD version 1.

Uppsala Group on Multiword Expressions

(Aaron Smith, Alessandro Lenci, Anders Johannsen, Jan Hajic, Marie-Catherine de Marneffe, Veronika Vincze)

The group discussion involved three main topics: the treatment of light verb constructions, the treatment of other MWEs and multiword abbreviations.

Light verb constructions (LVCs)

Currently there are three different approaches in the UD treebanks concerning the treatment of LVCs (see a forthcoming poster at the next PARSEME meeting):

During the discussion, we argued that in LVCs, the syntactic and semantic heads of the structure are different, hence we should treat syntactic and semantic aspects differently. We propose to apply a simple syntactic relation between the verb and its argument at the syntactic level, whereas the argument of the construction should be attached to the noun whenever it is possible (i.e. the noun and the argument can stand on their own without the verb). However, the LVC-ness of the construction should be marked at the semantic level, i.e. as an enhanced relation.

Multiword expressions

As for idioms, we propose a similar annotation scheme: general syntactic labels should be given and idiomacity should be marked at the enhanced (semantic) level.

We clarified the definitions of the relations compound and mwe (the existing guidelines do not change):

Multiword abbreviations

Most tokenizers treat MW abbreviations differently, depending on the fact whether it includes a delimiter within (e.g. e.g.) or not (e.g. etc.). Thus this issue is related to tokenization and is still to be resolved. As a preliminary suggestion, however, we agreed that the relation mwe (or the enhanced relation mwe:abbrev) should be applied in between the tokens in case MW abbreviations are to be split up.