This section contains detailed discussion of particular linguistic constructions that fall outside (or cut across) the main categories of simple clause, complex clauses, and nominal phrases.
- Multiword expressions
- Comparative constructions
- Paratactic constructions
As discussed in the section on complex clauses, we treat coordinate structures asymmetrically by attaching all non-first conjuncts to the first conjunct via the u-dep/conj relation. Coordinating conjunctions and punctuation delimiting the conjuncts are attached to the immediately following conjunct using the u-dep/cc and u-dep/punct relations respectively. This analysis is applied to all cases of coordination at the clause, phrase or word level.
He came home , took a shower and immediately went to bed . conj(came, took) conj(came, went) punct(took, ,-4) cc(went, and)
He read the newspaper and a good book . conj(newspaper, book) cc(book, and)
He read one or two books . conj(one, two) cc(two, or)
The UD approach to ellipsis can be summarized as follows:
- If the elided element has no overt dependents, we do nothing.
- If the elided element has overt dependents, we promote one of these to take the role of the head.
- If the elided element is a predicate and the promoted element a core argument, we use the
orphanrelation when attaching other non-functional dependents to the promoted head.
Ellipsis in Nominals
If the head nominal is elided, we promote dependents in the following order:
Er kauft sich ein grünes Auto und sie kauft sich ein rotes . \n He buys himself a green car and she buys herself a red . nsubj(kauft-2, Er-1) det(Auto-6, ein-4) amod(Auto-6, grünes-5) obj(kauft-2, Auto-6) conj(kauft-2, kauft-9) nsubj(kauft-9, sie-8) obj(kauft-9, rotes-12) det(rotes-12, ein-11)
She saw every animal at the zoo but he saw only some . nsubj(saw-2, She-1) det(animal-4, every-3) obj(saw-2, animal-4) conj(saw-2, saw-10) advmod(some-12, only-11) obj(saw-10, some-12)
She saw three monkeys and he saw two . nsubj(saw-2, She-1) nummod(monkeys-4, three-3) obj(saw-2, monkeys-4) conj(saw-2, saw-7) obj(saw-7, two-8)
Ellipsis in Clauses
If the main predicate is elided, we use simple promotion only if there is an
cop, or a
mark in the case of an infinitival marker.
Sue likes pasta and Peter does , too . nsubj(likes-2, Sue-1) obj(likes-2, pasta-3) conj(likes-2, does-6) nsubj(does-6, Peter-5) advmod(does-6, too-8)
Sue is hungry and Peter is , too . nsubj(hungry-3, Sue-1) cop(hungry-3, is-2) conj(hungry-3, is-6) nsubj(is-6, Peter-5) advmod(is-6, too-8)
They will do it if they want to . nsubj(do-3, They-1) aux(do-3, will-2) obj(do-3, it-4) advcl(do-3, want-7) nsubj(want-7, they-6) xcomp(want-7, to-8)
In more complicated cases where a predicate is elided but no
cop is present, simple promotion (without
orphan deprels) could lead to very unnatural and confusing relations. For example, in the following sentence, you would be the subject of coffee, suggesting that the second clause contains a copular construction rather than an elided predicate.
In such cases, we promote dependents in the following order:
and for the non-promoted dependents, we use the special relation
orphan to signal a non-standard dependency.
If it is necessary to select among several orphans of the same type (e.g. there are just two orphans and both are
the orphan occurring first (closer to the sentence start) is promoted.
I like tea and you coffee . nsubj(like-2, I-1) obj(like-2, tea-3) conj(like-2, you-5) cc(you-5, and-4) orphan(you-5, coffee-6)
Mary wants to buy a book and Jenny a CD . nsubj(wants-2, Mary-1) xcomp(wants-2, buy-4) obj(buy-4, book-6) conj(wants-2, Jenny-8) orphan(Jenny-8, CD-10)
They had left the company , many for good . nsubj(left, They) obj(left, company) conj(left, many) orphan(many, good)
Mary wants to buy a book . ROOT And Jenny a CD . nsubj(wants-2, Mary-1) xcomp(wants-2, buy-4) obj(buy-4, book-6) root(ROOT, Jenny) orphan(Jenny, CD)
Note that the
orphan relation is only used when an ordinary relation would be misleading (for example, when attaching an object to a subject). In particular, the ordinary
cc relation should be used for the coordinating conjunction, which attaches to the pseudo-constituent formed through the
Multiword expressions (MWEs) are combinations of words that (in some respect and to different degrees) behave as lexical units rather than compositional syntactic phrases. The UD taxonomy contains three special relations for analyzing MWEs:
- u-dep/fixed are used to analyze fixed grammaticized MWEs like in spite of (see above)
- u-dep/flat are used to analyze exocentric semi-fixed MWEs like Barack Obama
- u-dep/compound are used to analyze (endocentric) compounds like noun phrase
Structures analyzed with u-dep/fixed and u-dep/flat are headless by definition and are consistently annotated by attaching all non-first elements to the first and only allowing outgoing dependents from the first element.
By contrast, compounds are annotated to show their modification structure, including a regular concept of head:
The syntax of comparative constructions poses various challenges for linguistic theory. For English, many of these are discussed in Bresnan (1973) and Huddleston and Pullum (2002, chapter 13). We give a discussion of equality comparisons (That car is as big as mine) and inequality scalar comparisons (Sue is taller than Jim).
In constructions of the form as X as Y or the same X as Y, X and Y can be of a range of syntactic types, leading to surface forms such as those exemplified below:
- Commitment is as important as a player’s talent.
- Get the cash to him as soon as possible.
- I put in as much flour as the recipe called for.
We note that the head of the whole construction appears to be the head of the X phrase. We can simply say:
- Commitment is important.
- Get the cash to him soon.
- I put in flour.
We then say that the first as is an independent modifier in the comparative, modifying something in the X phrase, in part because the following as Y is fairly optional:
- Commitment is (just) as important.
- ?Get the cash to him (just) as soon.
- I put in (just) as much flour.
However, this first as may not modify the head of X, it may modify an existing modifier of the head of X. Its role is adverbial (u-dep/advmod) consistent with other kinds of degree modification:
Commitment is as important as a player ’s talent . advmod(important, as-3)
I put in as much flour as the recipe called for . advmod(much, as-4) amod(flour, much)
We then take the complement of the comparative as an oblique dependent of the first part. It is clear that the material in the complement as Y can be clausal. It is also usually optional, as indicated above. For that reason, we usually make the complement an u-dep/advcl, with the second as analyzed as a mark. That gives us:
I do n't hear from my brother as often as I previously heard from him . nsubj(hear, I-1) aux(hear, do) advmod(hear, n't) case(brother, from-5) det(brother, my) obl(hear, brother) advmod(often, as-8) advmod(hear, often) mark(heard, as-10) nsubj(heard, I-11) advmod(heard, previously) advcl(often, heard) case(him, from-14) obl(heard, him) punct(hear, .)
We take the as Y clause as a dependent of the content-word whose degree is being assessed (here often). We take its head to be the head of the clause, here heard. An initially plausible alternative analysis would be to make the clausal dependent headed by as a dependent of the comparative modifier as, more, or less, and indeed this is the analysis which Huddleston and Pullum (2002) argue for in English. However, there are several reasons to doubt this analysis. One is the general principles of UD in favoring content words as heads. A second argument is motivated by a desire for crosslinguistic adequacy: in languages such as Finnish and Japanese, this functional element is not present.
“Y” より “X” が 面白い 。 \n Y than X NOM interesting . nsubj(面白い, “X”) case(“X”, が) case(“Y”, より) obl(面白い, “Y”) punct(面白い, 。)
Since the first as is a functional element, the dependent can be understood to modify the whole phrase as often, and therefore is attached to the head of that phrase. Additionally, it might be noted that comparatives without a comparative word occur in certain varieties of English. For example in Indian English you find usages such as So don’t worry if you think that you have a girl-friend, who is intelligent than you. One further argument from morphological comparatives is discussed below.
The same basic analysis is given for inequality scalar comparatives, with more or less or a comparative adjective and than, parallel to the two uses of as above, except that more can also directly modify a noun, and more is then taken to have the u-dep/amod relation to the noun. In this case, we take the comparative complement as directly depending on more, roughly seeing it as elliptical for more numerous. In general, the comparative complement always depends on an adjective or adverb, and is usually an advcl except when it is directly analyzed as an obl (as discussed at the end of this section).
more problems than you thought of last week amod(problems, more) advcl(more, thought) mark(thought, than)
more important than you thought advmod(important, more) advcl(important, thought) mark(thought, than)
more rapidly than you thought advmod(rapidly, more) advcl(rapidly, thought)
a more difficult problem than you thought advmod(difficult, more) amod(problem, difficult) advcl(difficult, thought)
In addition to crosslinguistic adequacy, we can see here another possible advantage of not attaching the than clause to more: This analysis then means that the dependency structure is more parallel between cases with a periphrastic comparative like more intelligent and a morphological comparative like taller (even though in bound morpheme cases, the -er could be argued to be the comparative head).
smarter than you thought advcl(smarter, thought) mark(thought, than)
fiksumpi kuin luulit \n smarter than you_thought advcl(fiksumpi, luulit) mark(luulit, kuin)
a smarter boy than you thought amod(boy, smarter) advcl(smarter, thought) mark(thought, than)
If the head is elided, then the functional element can be promoted.
Wheat raises blood sugar even more than sugar does . advcl(more, does)
Very commonly the complement clause in a comparative undergoes various amounts of partial reduction or ellipsis, sometimes to a quite extreme extent:
I put in as much flour as the recipe called for . nsubj(put, I) compound(put, in) advmod(much, as-4) amod(flour, much) obj(put, flour) mark(called, as-7) det(recipe, the) nsubj(called, recipe) advcl(much, called) obl(called, for) punct(put, .)
He plays better drunk than sober nsubj(plays, He) advmod(plays, better) acl(He, drunk) mark(sober, than) advcl(better, sober)
In general, we treat whatever remnant that remains as still an u-dep/advcl, as above.
However, a limiting case of this is that only a nominal is present:
- as important as a player ‘s talent
- more important than a player ‘s talent
The analysis in this case is unclear: Should the comparative complement still be analyzed as an extremely reduced complement clause or analyzed as simply a nominal modifier? There are arguments for both positions. For English, there is a long discussion of the arguments in section 2.2 of chapter 13 of Huddleston and Pullum (2002). We err on the side of minimizing the postulation of unobserved structure and opt to treat these cases as just an oblique nominal complement:
as important as a player 's talent advmod(important, as-1) case(talent, as-3) obl(important, talent)
more important than a player 's talent advmod(important, more) case(talent, than) obl(important, talent)
More than as a multi-word expression
In certain contexts the comparative complement combines both the action or adjective that is being compared and the quantity it is compared to:
- more than 90 percent (= over 90 percent)
- more than likely
- Home prices have more than doubled in the past decade.
In these cases we consider more than to be a fixed multi-word expression because the two words are inseparable. One cannot say *more percent than 90.
That is more than likely . nsubj(likely, That) cop(likely, is) advmod(likely, more) fixed(more, than) punct(likely, .-6)
If the expression modifies a counted noun phrase, it attaches directly to the modified number:
more than two years ago nummod(years, two) fixed(more, than) advmod(two, more)
If there is no number (because the indefinite article functions as the number “one”), it attaches directly to the head noun:
more than a year ago det(year, a) fixed(more, than) advmod(year, more)
The parataxis relation is used to analyze a number of constructions where clauses are combined by relations that are looser than standard coordination.
Side-by-side sentences (“run-on sentences”)
The parataxis relation is used for a pair of what could have been standalone sentences, but which are being treated together as a single sentence. This may happen because sentence segmentation of the sentence was done primarily following the presence of sentence-final punctuation, and these clauses are joined by punctuation such as a colon or comma, or not delimited by punctuation at all. In a spoken corpus, it may happen because what is labeled as a sentence is more commonly an utterance turn. Even if the treebanker is doing the sentence division, it may happen because there seems to be a clear discourse relation linking two clauses. Sometimes there are more than two sentences joined in this way. In this case we make all the later sentences dependents of the first one, to maximize similarity to the analysis used for conjunction.
Bearded dragons are sight hunters , they need to see the food to move . parataxis(hunters, need)
This relation may happen with units that are smaller than sentences:
Divided world the CIA amod(world, Divided) parataxis(world, CIA) det(CIA, the)
For this reported speech example:
The guy , John said , left early in the morning parataxis(left, said)
there are paraphrases that convey essentially the same meaning but with a different syntactic structure. When the reported speech is embedded in a subordinate clause (with or without an overt complementizer that), the subordinate clause is a ccomp of the speech verb. When the reported speech follows the speech verb and is separated by a colon, the reported speech forms a main clause that attaches to the preceding main clause with a parataxis relation, hence with the speech verb as its head. However, when the speech verb occurs as a medial or final parenthetical, the relation is reversed and the speech verb is treated as a parataxis of the reported speech. This analysis is not uncontroversial but follows many authorities, such as Huddleston and Pullum (2002), The Cambridge Grammar of the English Language (see chapter 11, section 9).
John said that the guy left early in the morning . ccomp(said, left)
John said the guy left early in the morning . ccomp(said, left)
John said : “ The guy left early in the morning . ” parataxis(said, left)
“ The guy left early in the morning ” , John said . parataxis(left, said)
The guy left early in the morning , John said . parataxis(left, said)
The guy , he said , left early in the morning . parataxis(left, said)
An argument for this analysis is that in the cases analyzed as embedding, the entire clause can be further embedded (I was taken aback when John said the guy left early in the morning.), while this is not possible with medial or final placement of the speech verb (*I was taken aback when the guy left early this morning, John said.).
News Article Bylines
We have used the parataxis relation to connect the parts of a news article byline. There does not seem to be a better relation to use.
Washington ( CNN ) : parataxis(Washington, CNN)
Single word or phrase interjections are analyzed as discourse, but when a whole clause is interjected, we use the relation parataxis.
Calafia has great fries ( they are to die for ! ) parataxis(has, are)
Just to let you all know Matt has confirmed the booking for 3rd Dec is OK . parataxis(confirmed, let)
In the second example, we treat the second half as the head of the dependency because the first half feels like a whole clause interjection, not like the main clause of the utterance.
We also use the parataxis relation for tag questions such as isn’t it? or haven’t you?.
It 's not me , is it ? parataxis(me, is)
- Direct and reported speech: currently described under u-dep/parataxis
In a sentence starting with a feedback word such as yes or no and continuing with a main clause, we take the predicate of the main clause to be the root of the sentence and attach the feedback word to this predicate with a discourse relation:
yes , we should apply for membership . discourse(apply, yes)
However, when the feedback is expressed by a full clause instead of a feedback word, the predicate of this clause is taken as the root and the predicate of the following clause is attached with a parataxis relation:
I agree , we should apply for membership . parataxis(agree, apply)
Tokens with the relation u-dep/punct always attach to content words (except in cases of ellipsis) and can never have dependents. Since
punct is not a normal dependency relation, the usual criteria for determining the head word do not apply.
Instead, we use the following principles:
- A punctuation mark separating coordinated units is attached to the immediately following conjunct.
- A punctuation mark preceding or following a subordinated unit is attached to this unit.
- Within the relevant unit, a punctuation mark is attached at the highest possible node that preserves projectivity.
- Paired punctuation marks (quotes and brackets) should be attached to the same word unless that would create non-projectivity. This word is usually the head of the phrase enclosed in the paired punctuation.