home edit page issue tracker

This page pertains to UD version 2.

flat: flat multiword expression

The flat relation is one of three relations for multiword expressions multiword expressions (MWEs) in UD (the other two being fixed and compound). It is used for exocentric (headless) semi-fixed MWEs like names (Hillary Rodham Clinton) and dates (24 December). It contrasts with fixed, which applies to completely fixed grammaticized (function word-like) MWEs (like in spite of), and with compound, which applies to endocentric (headed) MWEs (like apple pie).

Flat MWEs are annotated with a flat structure, where all subsequent words in the expression are attached to the first one using the flat label. The assumption is that in these expressions, the flat relations are not syntactic head-modifier relations, and that the structural annotation is in principle arbitrary. For consistency, UD specifies that the first word of the expression shall be the head of all flat dependents. These dependents may have other modifiers so long as they are not flat.

Below we describe some of the most common uses of flat across languages. Note that semantically equivalent expressions in different languages (or even in the same language) may require a different analysis if sometimes there is and sometimes there is not a regular compositional syntactic structure.

Names

In many languages, there are multiword proper names with no clear internal syntactic structure and no clear evidence that one of the words is the syntactic head. Such names are annotated using the flat relation, with the optional subtype flat:name.

Hilary Rodham Clinton
flat(Hilary, Rodham)
flat(Hilary, Clinton)
Carl XVI Gustaf
flat(Carl-1, Gustaf-3)
flat(Carl-1, XVI-2)
New York
flat(New, York)

Titles/honorifics are also analyzed using the flat relation. Note that some titles are complex and have their own internal syntactic structure. Such structure is shown with regular relations embedded under flat:

Mr. Smith
flat(Mr., Smith)
President Obama
flat(President, Obama)
French actor Gaspard Ulliel
amod(actor-2, French-1)
flat(actor-2, Gaspard-3)
flat(actor-2, Ulliel-4)
Milliardär Ross Perot \n billionaire Ross Perot
flat(Milliardär-1, Ross-2)
flat(Milliardär-1, Perot-3)

However if the two halves of a descriptive title and a name appear to be two separate nominals, then analysis with flat is not appropriate, and u-dep/appos is appropriate. These cases are often set off by punctuation, such as a comma, but no punctuation may appear in more informal text. You can generally test for such examples by asking if the two halves can be reversed; if they can, it is probably an appos; see the examples there.

In contrast to the above, names that have a regular syntactic structure, like The Lord of the Rings and Captured By Aliens, should be annotated with regular syntactic relations.

The Lord of the Rings
det(Lord, The)
nmod(Lord, Rings)
case(Rings, of)
det(Rings, the)
The king of Sweden
det(king-2, The-1)
nmod(king-2, Sweden-4)
case(Sweden-4, of-3)

For organization names with clear syntactic modification structure, the dependencies should also reflect the syntactic modification structure using regular syntactic relations, as in:

Natural Resources Conservation Service
amod(Resources-2, Natural-1)
compound(Conservation-3, Resources-2)
compound(Service-4, Conservation-3)

In addition, regular syntactic relations are used: (i) for a modifying determiner or similar function word and (ii) to connect together the words of a description or name which involve embedded prepositional phrases, sentences, etc., when these relations are (i) recognized in the language being annotated (i.e., the analyses below are for French, German, and Spanish, not English) and (ii) deemed not to be grammaticalized to the extent that the original role of the function words has been lost.

Le Japon
det(Japon-2, Le-1)
Ludwig van Beethoven
case(Beethoven, van)
nmod(Ludwig, Beethoven)
Miguel de Cervantes y Saavedra
conj(Cervantes, Saavedra)
cc(Saavedra, y)
case(Cervantes, de)
nmod(Miguel, Cervantes)
Río de la Plata
case(Plata-4, de-2)
det(Plata-4, la-3)
nmod(Río-1, Plata-4)

The above analyses of Ludwig van Beethoven and Miguel de Cervantes y Saavedra assume that van resp. de are prepositions. This is true in the languages of the names’ origin, but it can be expected to change when the name is used in foreign text or when sufficient grammaticalization has taken place. For example, when names like this are annotated in English, the appropriate analysis is as a flat name:

Ludwig van Beethoven was a famous German composer .
flat(Ludwig, van)
flat(Ludwig, Beethoven)
det(composer, a)
amod(composer, famous)
amod(composer, German)
cop(composer, was)
nsubj(composer, Ludwig)
punct(composer, .)
Río de la Plata
flat(Río-1, de-2)
flat(Río-1, la-3)
flat(Río-1, Plata-4)
Al Arabiya is a Saudi-owned news organization
flat(Al-1, Arabiya-2)
nsubj(organization-7, Al-1)

And in Modern German or French, these prepositions have generally just become a fossilized part of a family name and regularly appear without the given name. Again, here, analysis as flat seems correct:

Von Hohenlohe gewann das Rennen . \n Von Hohenlohe won the race .
flat(Von-1, Hohenlohe-2)
nsubj(gewann-3, Von-1)

In the case of proper entities named after people, e.g. Leland Stanford Jr. University, the flat relation should only be used inside the person name, with the rest of the construction analyzed compositionally using normal syntactic relations:

Leland Stanford Jr. University
compound(University-4, Leland-1)
flat(Leland-1, Stanford-2)
flat(Leland-1, Jr.-3)

On occasion, an expression with no clear head at the top level will have internal syntactic modifiers or punctuation:

Dwayne " The Rock " Johnson
flat(Dwayne, Rock)
flat(Dwayne, Johnson)
det(Rock, The)
punct(Rock, "-2)
punct(Rock, "-5)

Likewise, in a Portuguese sentence, the surname “Paulo da Silva” would be analyzed with internal structure:

Roberto Paulo da Silva Júnior
flat(Roberto, Paulo)
flat(Roberto, Júnior)
nmod(Paulo, Silva)
case(Silva, da)

But a flat structure cannot be nested immediately under another flat structure. For example, the words of an embedded nickname would be treated as top-level parts of the flat expression:

Denise " Dee Dee " Bridgewater
flat(Denise, Dee-3)
flat(Denise, Dee-4)
flat(Denise, Bridgewater)
punct(Dee-3, "-2)
punct(Dee-4, "-5)

Some further notes on relations for names

This paragraph briefly records some of the arguments that have been made in the past on relations for name structure. It is an issue over which there has historically been variation and about which there is some continuing debate. Examples like French actor Gaspard Ulliel: Some treebanks have used nmod for titles and honorifics like Mr. or French actor. Most people think this is inappropriate, since an nmod dependent should be a full phrase, which will typically take its own case as a modifier in a cased language. In contrast, these titles seem to be part of the same phrase as the name that follows them; they show case agreement concord in a cased language. Some grammatical traditions, descending from Latin, call French actor in such cases a “fixed (or close) apposition” and take the name as the head. UD has restricted the appos relation to following appositives (corresponding to “loose (or wide) apposition” in the Latin tradition). The relation appos is only used when you have two full nominals, typically joined loosely, and often separated by a punctuation mark like a comma. So appos is not correct for these cases. Sometimes the relation compound has been used, but this does not seem right. It implies headedness, and titles do not usually behave like compounds: in German, they are not joined to the following words, as compounds are normally joined in German, and they appear at the beginning of names in both German and Hebrew, even though German compounds are head last and Hebrew compounds are head first. So compound does not seem appropriate either. Some UDv1 treebanks used name for honorifics like Mr., although some felt that was wrong and name should be restricted to joining the proper nouns of multi-word names. In UDv2, name was removed and replaced by flat, which allowed a broader notion of a chunk of unheaded material. In the UDv2 guidelines, cases of both titles and honorifics are joined to names with flat.

Dates and Complex Numerals

Date expressions come in many shapes and forms across languages. In some cases, they have a very clear syntactic structure, as in the 4th of July, and should be annotated with regular dependency relations. In other cases, they have a flat structure with no clearly discernible head, as in 1 December 2016, in which case the flat relation should be used.

the 4th of July
det(4th, the)
nmod(4th, July)
case(July, of)
1 December 2016
flat(1, December)
flat(1, 2016)

The flat relation can also be used for other numerals and other numerical expressions that lack phrasal structure.

four thousand
flat(four, thousand)

Foreign Phrases

The flat relation, with the optional subtype flat:foreign should also be used when a foreign phrase cannot be given a compositional analysis. In this case, it replaces the foreign relation, which was used in v1 but is no longer part of the relation taxonomy.

And then she went : gjiko frac zen .
parataxis(went, gjiko)
flat(gjiko, frac)
flat(gjiko, zen)

flat in other languages: [bg] [bm] [cop] [cs] [de] [el] [en] [et] [eu] [fi] [fr] [ga] [hy] [it] [kk] [pcm] [pt] [ru] [swl] [tr] [u] [yue] [zh]