flat
: flat expression
The flat
relation is used to combine the elements of an expression where none of the immediate components can be identified as the sole head using standard substitution tests.
This includes both cases where more than one component passes the head test – as in the name John Smith, where either John or Smith can replace the whole in most contexts – and cases where no component does – as in San Francisco (in English).
Note also that the flat
relation is appropriate in such cases only when no more specific relation applies.
For example, in coordination structures annotated with the conj relation, any of the conjuncts can usually replace the whole.
Flat expressions are annotated with a flat structure, where all subsequent components in the expression are attached to the
first one using the flat
label. The assumption is that in these expressions, the flat
relations
are not syntactic head-modifier relations, and that the structural annotation is in principle arbitrary.
The components of a flat expression may have their own dependents, including nested flat structures.
For example, in the name Mary Jane Tyler Smith, both the first name (Mary Jane) and the last name
(Tyler Smith) are flat expressions, which are combined into a larger flat name (the tree appears below).
The prototypes for flat are: (i) personal names, (ii) foreign expressions, (iii) iconic sequences, and (iv) items separated for readability.
These are illustrated in the sections below.
The application of flat
may extend beyond these prototypes to, e.g., various kinds of name and number expressions.
However, even if an expression is idiosyncratic or follows a specialized pattern, every effort should be made to find a head rather than employing flat
.
If a head can be found but no substantive dependency relation is appropriate, dep can be used.
Note that what is considered to be transparent linguistic syntax (as opposed to flat structure) is subject to treebank-specific policies. (E.g., some treebanks might provide proper grammatical analyses in the presence of code-switching, or treat mathematical notation as following linguistic strategies like predication.)
Some languages opt to subcategorize usages of flat
via subtypes.
In particular, many treebanks use the flat:name
and flat:foreign
subtypes converted from the v1 relations name
and foreign
.
The examples on this page simply use plain flat
.
Names
A person’s name (or parts thereof) may lack the hallmarks of general constructions in the language, such that no single word can be identified as the head, in which case a flat structure applies.
Hillary Rodham Clinton
flat(Hillary, Rodham)
flat(Hillary, Clinton)
Nesting is possible:
Mary Jane Tyler Smith
flat(Mary, Jane)
flat(Tyler, Smith)
flat(Mary, Tyler)
On occasion, an expression with no clear head at the top level will have internal syntactic modifiers or punctuation:
Dwayne " The Rock " Johnson
flat(Dwayne, Rock)
flat(Dwayne, Johnson)
det(Rock, The)
punct(Rock, "-2)
punct(Rock, "-5)
The scope of flat
may extend beyond names of persons to names of other kinds of entities that depart from general headed structure.
The expressions under this category must be established by language-specific criteria.
The ExtPos feature may be used to signal the external syntactic distribution of the flat expression—e.g., ExtPos=PROPN
for 17 in:
17/NUM[ExtPos=PROPN] Across/ADV is wrong in this crossword .
flat(17, Across)
Flat vs. non-flat names
Names that have a regular syntactic structure, like The Lord of the Rings and Captured By Aliens, should be annotated with regular syntactic relations rather than flat structures:
The Lord of the Rings
det(Lord, The)
nmod(Lord, Rings)
case(Rings, of)
det(Rings, the)
The king of Sweden
det(king-2, The-1)
nmod(king-2, Sweden-4)
case(Sweden-4, of-3)
For organization names with clear syntactic modification structure, the dependencies should also reflect the syntactic modification structure using regular syntactic relations, as in:
Natural Resources Conservation Service
amod(Resources-2, Natural-1)
compound(Conservation-3, Resources-2)
compound(Service-4, Conservation-3)
In addition, regular syntactic relations are used: (i) for a modifying determiner or similar function word and (ii) to connect together the words of a description or name which involve embedded prepositional phrases, sentences, etc., when these relations are (i) recognized in the language being annotated (i.e., the analyses below are for French, German, and Spanish, not English) and (ii) deemed not to be grammaticalized to the extent that the original role of the function words has been lost.
Le Japon
det(Japon-2, Le-1)
Ludwig van Beethoven
case(Beethoven, van)
nmod(Ludwig, Beethoven)
Miguel de Cervantes y Saavedra
conj(Cervantes, Saavedra)
cc(Saavedra, y)
case(Cervantes, de)
nmod(Miguel, Cervantes)
Río de la Plata
case(Plata-4, de-2)
det(Plata-4, la-3)
nmod(Río-1, Plata-4)
A name may combine flat and non-flat structure. In a Portuguese text, the surname Paulo da Silva would be analyzed as follows:
Roberto Paulo da Silva
flat(Roberto, Paulo)
nmod(Paulo, Silva)
case(Silva, da)
The above analyses of Ludwig van Beethoven and Miguel de Cervantes y Saavedra assume that van resp. de are prepositions.
This is true in the languages of the names’ origin, but it can be expected to change when the name is used in foreign text
or when sufficient grammaticalization has taken place. For example,
when names like this are annotated in English, the appropriate analysis is as a flat
name:
Ludwig van Beethoven was a famous German composer .
flat(Ludwig, van)
flat(Ludwig, Beethoven)
det(composer, a)
amod(composer, famous)
amod(composer, German)
cop(composer, was)
nsubj(composer, Ludwig)
punct(composer, .)
Río de la Plata
flat(Río-1, de-2)
flat(Río-1, la-3)
flat(Río-1, Plata-4)
Al Arabiya is a Saudi-owned news organization
flat(Al-1, Arabiya-2)
nsubj(organization-7, Al-1)
And in Modern German or French, these prepositions have generally just become a fossilized part of a family name and regularly appear without the given name. Again, here, the flat analysis seems correct:
Von Hohenlohe gewann das Rennen . \n Von Hohenlohe won the race .
flat(Von-1, Hohenlohe-2)
nsubj(gewann-3, Von-1)
Foreign expressions
This encompasses expressions that may have been borrowed or quoted, but whose original grammatical structure is not necessarily accessible to speakers of the language(s) being annotated.
And then she went : gjiko frac zen .
parataxis(went, gjiko)
flat(gjiko, frac)
flat(gjiko, zen)
“Foreign” includes not just natural languages but also notational systems that are considered external to natural language proper and are governed by separate rules (e.g., musical chord progressions, software code excerpts).
The Vienna Game move order is 1. e4 e5 2. Nc3 .
nsubj(1., order)
cop(1., is)
flat(1., e4)
flat(1., e5)
flat(1., 2.)
flat(1., Nc3)
See further discussion at Foreign Expressions and Code-Switching.
History: UD v1 had a foreign
relation, but this is no longer part of the relation taxonomy and has been subsumed under flat
.
Iconic sequences
Sequences for which neither head-dependent nor coordination relationships apply include onomatopoeia (quack quack quack), “filler” words (do re mi), and gibberish (blargety blarg blarg).
The duck said quack quack quack
obj(said, quack-4)
flat(quack-4, quack-5)
flat(quack-4, quack-6)
Items separated for readability
Here the units separated by spaces or punctuation cannot really be construed as separate lexemes. A common case is telephone numbers:
Call 0118 999 881 999 119 725 3
obj(Call, 0118)
flat(0118, 999-3)
flat(0118, 881)
flat(0118, 999-5)
flat(0118, 119)
flat(0118, 725)
flat(0118, 3)
Filenames are another such case: they may contain spaces, and the components may or may not be recognizable as natural language strings, but in general filenames are not expected to follow regular syntactic structure. flat
signals filenames are a context where regular syntactic rules do not apply (whether the component tokens are analyzed morphologically like words of an art title, or simply tagged as X, or a mixture; the precise tokenization and morphological analysis is left to the discretion of treebanks). ExtPos=PROPN
may be specified in the MISC column to signal that the whole filename functions externally as a proper noun. For example, the filename Mydoc CHQ2 - Wednesday DRAFT (2).txt
might be analyzed as follows:
Mydoc/X[ExtPos=PROPN] CHQ2/X -/PUNCT Wednesday/PROPN DRAFT/PROPN (/PUNCT 2/NUM )/PUNCT .txt/X
flat(Mydoc, CHQ2)
flat(Mydoc, -)
flat(Mydoc, Wednesday)
flat(Mydoc, DRAFT)
flat(Mydoc, ()
flat(Mydoc, 2)
flat(Mydoc, ))
flat(Mydoc, .txt)
It is not expected that a language’s tokenization rules will make special exceptions for spaces in telephone numbers or filenames. That is, if spaces trigger token boundaries in general, they should also do so for telephone numbers and filenames; exceptional token-internal spaces will not be permitted.
Not all “unnecessary” spaces warrant flat
, however:
- improper spacing within a word should be addressed with goeswith
- numerals with thousands separator spaces (e.g. 1 000 000) may be treated as single words in languages where this convention is widespread
flat in other languages: [bg] [bm] [cop] [cs] [de] [el] [en] [et] [eu] [fi] [fr] [ga] [gd] [hy] [it] [ka] [kk] [ky] [pcm] [pt] [ru] [sl] [swl] [tr] [u] [vi] [xcl] [yue] [zh]