home edit page issue tracker

This page pertains to UD version 2.

flat: flat expression

The flat relation is used to combine the elements of an expression where none of the immediate components can be identified as the sole head using standard substitution tests. This includes both cases where more than one component passes the head test – as in the name John Smith, where either John or Smith can replace the whole in most contexts – and cases where no component does – as in San Francisco (in English). Note also that the flat relation is appropriate in such cases only when no more specific relation applies. For example, in coordination structures annotated with the conj relation, any of the conjuncts can usually replace the whole.

Flat expressions are annotated with a flat structure, where all subsequent components in the expression are attached to the first one using the flat label. The assumption is that in these expressions, the flat relations are not syntactic head-modifier relations, and that the structural annotation is in principle arbitrary. The components of a flat expression may have their own dependents, including nested flat structures. For example, in the name Mary Jane Tyler Smith, both the first name (Mary Jane) and the last name (Tyler Smith) are flat expressions, which are combined into a larger flat name (the tree appears below).

The prototypes for flat are: (i) personal names, (ii) foreign expressions, (iii) iconic sequences, and (iv) items separated for readability. These are illustrated in the sections below. The application of flat may extend beyond these prototypes to, e.g., various kinds of name and number expressions. However, even if an expression is idiosyncratic or follows a specialized pattern, every effort should be made to find a head rather than employing flat. If a head can be found but no substantive dependency relation is appropriate, dep can be used.

Note that what is considered to be transparent linguistic syntax (as opposed to flat structure) is subject to treebank-specific policies. (E.g., some treebanks might provide proper grammatical analyses in the presence of code-switching, or treat mathematical notation as following linguistic strategies like predication.)

Some languages opt to subcategorize usages of flat via subtypes. In particular, many treebanks use the flat:name and flat:foreign subtypes converted from the v1 relations name and foreign. The examples on this page simply use plain flat.

Names

A person’s name (or parts thereof) may lack the hallmarks of general constructions in the language, such that no single word can be identified as the head, in which case a flat structure applies.

Nesting is possible:

On occasion, an expression with no clear head at the top level will have internal syntactic modifiers or punctuation:

The scope of flat may extend beyond names of persons to names of other kinds of entities that depart from general headed structure. The expressions under this category must be established by language-specific criteria.

The ExtPos feature may be used to signal the external syntactic distribution of the flat expression—e.g., ExtPos=PROPN for 17 in:

Flat vs. non-flat names

Names that have a regular syntactic structure, like The Lord of the Rings and Captured By Aliens, should be annotated with regular syntactic relations rather than flat structures:

For organization names with clear syntactic modification structure, the dependencies should also reflect the syntactic modification structure using regular syntactic relations, as in:

In addition, regular syntactic relations are used: (i) for a modifying determiner or similar function word and (ii) to connect together the words of a description or name which involve embedded prepositional phrases, sentences, etc., when these relations are (i) recognized in the language being annotated (i.e., the analyses below are for French, German, and Spanish, not English) and (ii) deemed not to be grammaticalized to the extent that the original role of the function words has been lost.

A name may combine flat and non-flat structure. In a Portuguese text, the surname Paulo da Silva would be analyzed as follows:

The above analyses of Ludwig van Beethoven and Miguel de Cervantes y Saavedra assume that van resp. de are prepositions. This is true in the languages of the names’ origin, but it can be expected to change when the name is used in foreign text or when sufficient grammaticalization has taken place. For example, when names like this are annotated in English, the appropriate analysis is as a flat name:

And in Modern German or French, these prepositions have generally just become a fossilized part of a family name and regularly appear without the given name. Again, here, the flat analysis seems correct:

Foreign expressions

This encompasses expressions that may have been borrowed or quoted, but whose original grammatical structure is not necessarily accessible to speakers of the language(s) being annotated.

“Foreign” includes not just natural languages but also notational systems that are considered external to natural language proper and are governed by separate rules (e.g., musical chord progressions, software code excerpts).

See further discussion at Foreign Expressions and Code-Switching.

History: UD v1 had a foreign relation, but this is no longer part of the relation taxonomy and has been subsumed under flat.

Iconic sequences

Sequences for which neither head-dependent nor coordination relationships apply include onomatopoeia (quack quack quack), “filler” words (do re mi), and gibberish (blargety blarg blarg).

Items separated for readability

Here the units separated by spaces or punctuation cannot really be construed as separate lexemes. A common case is telephone numbers:

Filenames are another such case: they may contain spaces, and the components may or may not be recognizable as natural language strings, but in general filenames are not expected to follow regular syntactic structure. flat signals filenames are a context where regular syntactic rules do not apply (whether the component tokens are analyzed morphologically like words of an art title, or simply tagged as X, or a mixture; the precise tokenization and morphological analysis is left to the discretion of treebanks). ExtPos=PROPN may be specified in the MISC column to signal that the whole filename functions externally as a proper noun. For example, the filename Mydoc CHQ2 - Wednesday DRAFT (2).txt might be analyzed as follows:

It is not expected that a language’s tokenization rules will make special exceptions for spaces in telephone numbers or filenames. That is, if spaces trigger token boundaries in general, they should also do so for telephone numbers and filenames; exceptional token-internal spaces will not be permitted.

Not all “unnecessary” spaces warrant flat, however:


flat in other languages: [bg] [bm] [cop] [cs] [de] [el] [en] [et] [eu] [fi] [fr] [ga] [gd] [hy] [it] [ka] [kk] [ky] [pcm] [pt] [ru] [sl] [swl] [tr] [u] [vi] [xcl] [yue] [zh]
BESbswyBESbswyBESbswyBESbswy