This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home issue tracker

Specific constructions

Core clausal syntax: predicates and their arguments

Predicates

Main predicates in English are most often verbs, but they can also be adjectives, nouns and even adverbs. In UD, predicates are labeled with one of the clausal relations: root, ccomp, xcomp, advcl, acl (and its subtypes); or one of the loose-joining relations, conj and parataxis, under a head that has a clausal label.

Any dependent that can be said to attach at the clausal level (for example, core arguments, adverbial modifiers, complementizers, or conjoined clauses) will have the predicate word as its head.

UD does not distinguish light verbs from full verbs.

Copulas

This is true even in the case of nonverbal predicates, which is a distinguishing feature of Universal Dependencies. This is evident in the UD treatment of copulas.

Here, the head of this is interesting, because nsubj-labeled dependents attach at the clausal level, and the head of the lower clause is the adjective interesting. Similarly, it is the predicate that receives the clausal label ccomp.

In equative uses of copulas, the distinction between predicate ans subject is somewhat arbitrary. In those cases, linear order is used as a cue: the subject is taken to appear first.

In some of these equative uses, the right-hand side may be a clause, finite or not. In those cases, exceptionally, the copular verb is treated as the predicate, and the clause is given attached to it, with the ccomp label.

Nonverbal predicates with no copular verb

The treatment UD adopts for copulas is consistent with its treatment of small clauses. In a surprisingly wide range of constructions in English, a nonverbal predicate forms a constituent with its core arguments without any mediating verb. These constructions are directly parallel to copulas in the UD scheme, since copular verbs do not mediate relations between nonverbal predicates and their dependents.

In some of these constructions, one might argue that there is an ellided copular verb. We make no attempt to represent such a verb, with no cost to the dependency analysis.

In other cases, such as the constructions sometimes called absolute, it is harder to argue that there is an ellided verb. The analysis is still parallel to that of other nonverbal predicates, and to that of absolute constructions with nonfinite verbal predicates.

Core arguments

UD makes a distinction between core arguments and other dependents of predicates. In English, the UD relations that can designate core arguments are nsubj, nsubjpass, dobj and iobj for nominal arguments, and ccomp, xcomp, csubj and csubjpass for clausal arguments.

nsubj and nsubjpass are used for external arguments of any predicate (as in the examples above); the only difference is that nsubjpass is used in passive-voice clauses.

In the above example, it is important to mention that a plausible alternative representation would analyze this as a nominal phrase with a reduced relative. However, when possible, we prefer to choose a predicate as the root of a sentence.

Expletives can occur in subject or object position, and are represented with the label expl.

Expletives can have a subject-labeled sister.

The internal argument labels, dobj and iobj, are exclusive to verbal predicates and a handful of adjectives (namely: worth, like and unlike, following Huddleston and Pullum (2001)).

The distinction between dobj and iobj is strictly syntactic; iobj is reserved for “second objects” with restricted theta-roles, and is relatively rare in English. Only when another internal argument is present can iobj occur.

The other internal argument need not be nominal. In English, some verbs can take a nominal complement and a clausal complement together. In the case of these verbs, the nominal complement is always thematically restricted, which suggests it is an iobj serving as a “second object” to the clausal complement. For that reason, the clausal complement label ccomp never cooccurs with dobj, but does cooccur with dobj.

However, the same observation does not hold of verbs that take open complements, labeled xcomp (more on this label below). Those can clearly cooccur with thematically unrestricted objects under some verbs. For that reason, nominal complements cooccuring with xcomp are uniformly labeled dobj, and never iobj.

Clausal core arguments

Like other clausal labels, the clausal core argument labels apply to finite and nonfinite clauses without distinction. (In English xcomp can only be applied to nonfinite clauses because there is no control into finite clauses; but this is not part of the definition of xcomp.)

The distinction between csubj and csubjpass mirrors that between nsubj and nsubjpass.

The clausal subject labels apply to verbal as well as nonverbal predicates.

Much like nsubj(pass), csubj(pass) can (and often does) cooccur with an expletive.

Clausal core arguments are restricted to verbal and adjectival predicates. Nouns never take clausal core arguments. (See [](#### Clausal modifiers of nouns) for how to represent clausal dependents of nouns.)

Functional control

The label xcomp is used for predicates whose external argument is controlled by an argument of a higher clause. This applies in multiple types of constructions (often referred to as “small clauses”): raising, obligatory control, resultatives (obligatory and optional alike) and obligatory depictives.

This includes copula-like English verbs such as become, remain.

Noncore arguments and predicate modifiers

UD marks core arguments, but it does not make a distinction between noncore arguments and modifiers of a predicate. In English, noncore arguments are introduced by prepositions or subordinating conjunctions (which largely overlap with each other). Optional modifiers can also be introduced by such words. In UD, the representation of noncore arguments and predicate modifiers, while distinct from that of core arguments, is uniform. The entire set will be referred to here as noncore dependents.

Noncore dependents are classified by their syntactic properties. Nominal dependents (i.e., phrases whose lexical head is a noun) are labeled nmod. Most of these, in English, are introduced by prepositions.

In the example above, note that in England and in the 1980s are annotated with the same label, even though the former is arguably a noncore argument of live, while the latter is certainly not.

Bare nominals receive the label nmod:npmod, which is an English-specific relation.

More narrowly, bare nominals denoting a point in time receive the label nmod:tmod, also English-specific.

Clausal noncore dependents, whether finite or nonfinite, receive the label advcl.

This label can also apply to nonverbal predicates, as shown in this example (repeated from [](#### Nonverbal predicates with no copular verb)).

In the example above, an alternative analysis might represent no suggestions as a nominal dependent. However, we take the presence of so, which usually attaches to predicates, as evidence of clausal status.

Function words attaching to predicates

The labels mark, aux, auxpass and cop are used for function words that attach to predicates. While in some linguistic theories these are argued to be heads of constituents, in UD they are demoted to dependents of lexical heads, in line with the principle of primacy of content words.

These function words do not normally have dependents, but there are exceptions. They may have word-level dependents; they may also be coordinated (on the surface, due to VP-ellipsis), and have conjunction and conjunct dependents.

Unfortunately, not all conjunctions of function words attaching to predicates lend themselves of this analysis, which leads to a lack of parallelism across some constructions. In the following example, the first conjunct receives a promotion-by-head-ellision treatment.

Complementizers, subordinating conjunctions and the infinitival marker

In English, the label mark applies uniformly to complementizers, subordinating conjunctions and the infinitival marker.

Copular verbs

The copular verb be is treated as a function word: it is attached to the predicate and labeled cop, a special label for copular verbs. In English, only be receives this treatment. See [](#### Functional control) for copula-like verbs such as become.

Auxiliaries

Modal and auxiliary verbs are uniformly labeled as aux or auxpass in UD, and attached to their main verb. (When there is no main verb, the auxiliary is promoted by head ellision.) This is the case even when there are multiple auxiliaries; rather than chained together to reflect scope properties, they are flatly attached to the main verb.

The auxpass label applies only to passive auxiliaries.

The verb get can behave as a passive auxiliary, and when it does, it is annotated as such.

Below the clause

Word-level dependents: complex lexical units

While most types of dependents can be said to attach to phrases (i.e., nsubj dependents attach to verbal phrases; det dependents attach to noun phrases), some attach only at the word level. These types of dependencies form complex lexical units which then enter, as a composite, dependencies of their own.

Three relations can be used to form complex lexical units. The most straightforward one is goeswith, which can be used between any two tokens and serves to indicate that, as a result of input error, a single orthographic word is split into two space-separated tokens in the data.

The other two relations, mwe and compound, are more interesting. The main difference between them is that mwe applies between function words and other function words or lexical words, while compound applies only between lexical words.

The mwe relation is used sparingly. In general, the relation is used in grammaticalized uses of two or more function words together, often giving rise to noncompositional meaning. Since words joined by the mwe relation often have equal claim to the status of head, any such construction is, by convention, head-initial.

When the multiword expression is composed of more than two words, all non-head words attach directly to the head, in a flat structure.

Decisions about what should be annotated as a multiword expression are difficult due to the fact that such expressions exist in a continuous spectrum between phrases built via fully productive rules on the one hand, and fixed lexicalized expressions on the other. A series of criteria can be used to rule out the mwe label: optionality of one word in the construction; meaning compositionality; availability of variants in which one of the words is substituted.

The compound relation, on the other hand, can be used freely to represent productive phrase-building. The difference is that compound is used when a string of words joined together are analyzed as a single lexical unit that behaves as a head (i.e., an X^0 node) rather than as a constituent (i.e., an XP node) in the sentence.

A distinguished type of compound is the English particle verb. Particles that combine with verbs receive the language-specific label compound:prt.

Unlike multiword expressions, compounds can have inner structure, when appropriate.

The nominal domain: nominal and prepositional phrases

Nominal and prepositional phrases are uniformly organized around their nominal lexical head in UD. In addition to their argument roles, labeled nsubj, nsubjpass, dobj and iobj, nominal phrases can have roles as noncore dependents. In these roles, they are labeled nmod (and subtypes). Commonly, noncore dependents are realized as prepositional phrases.

Prepositions

Within prepositional phrases, prepositions are represented as dependents of their complements and labeled case.

Nested prepositional phrases are also organized around the single lexical head, in a flat representation parallel to that of verb groups.

Possessives

The label case is also used for the genitive ’s in English. The genitive nominal phrase receives the language-specific label nmod:poss.

The nmod:poss label is also used for possessive determiners.

This possessive modifier analysis is also used for genitives attaching to gerunds.

Determiners

In addition to case, the label det and its language-specific extension det:predet also designate function-word dependents of nominal heads. These labels are used for determiners: definite and indefinite articles, demonstrative determiners, quantifiers such as all, some, every and each.

Floating quantifiers are attached to the nominal head they modify.

In some English constructions, pronouns can cooccur with nominal heads and exhibit determiner-like behavior. In those constructions, these pronouns are annotated as det.

The label det:predet applies when a determiner is present, and preceding it is another determiner.

The label can only apply when det is also present.

Determiners with negative meaning receive the label neg instead of det.

Appositives

Optional modifiers: adverbial and adjectival phrases

Both predicates and nominals can be modified by optional phrases – adverbial and adjectival, respectively. Again, a distinction is made between clausal and nonclausal dependents. Adverbial clauses are labeled advcl. Adjectival clauses (of which relative clauses are a subtype) are labeled acl. Nonclausal adverbials are labeled advmod, and nonclausal adjectivals are labeled amod.

Clausal modifiers of nouns

Relative clauses are the canonical case of clausal modifiers of nouns, and they receive a special language-specific label, acl:relcl. In these clauses, the relative pronoun is analyzed in the function it takes in the lower clause, as illustrated here by that, labeled nsubj, and which, labeled nmod.

The acl:relcl relation is also used in free relatives, which are discussed in [](### Free relatives).

Relatives clauses are not, however, the only type of clausal modifiers of nouns. For one example, reduced relative clauses are not typed acl:relcl, but rather acl.

Additionally, many optional clausal dependents on nominals receive the acl label.

Depictives are also represented with the acl relation.

Quantifier phrases

The notion of quantifier phrase is applied loosely here to a variety of structures that modify nominals. The simplest type is probably simple numerical adjectives, which are labeled nummod.

Often some form of modification is applied to these numerical dependents, in the form of expressions such as more than (which is considered a multiword expression), about, over. These are analyzed as dependents of the numerical modifier, forming a complex quantifier phrase.

Ranges are also treated as numerical dependents. Note that in this case the dash - is represented as a preposition, because it is a functional equivalent of to (as becomes clear from the fact that it is normally read that way).

Beyond the clause

Beyond core clausal structures, there are many linguistic constructions, usually with discourse functions, that need to be represented in a complete dependency tree. Additionally, complex structures such as coordination and juxtaposition of structures in the same ortographical sentence need to be analyzed. Finally, written communication includes a wealth of information that is structured by rules that exist at the fringes of (or perhaps outside) the grammar of a language. In order to provide a complete representation, we integrate even that information into syntax trees, leading to some special dependency labels, and some peculiar annotation conventions.

Discourse-level dependents

UD introduces two special relations for discourse-level dependents: discourse, which is used to type a limited range of discourse markers, and the informatively named vocative, which is used for vocatives. These always attach to predicates, not because they modify them directly, but to express the fact that they have the highest-possible level of attachment.

Coordination and loose joining

Coordination is, in a sense, below as well as beyond the clause, since it can occur at any level. But that property is exactly what distinguishes it, and justifies placing it outside of core clausal syntax.

The difficulty of representing coordination, which is symmetrical, with an inherently-asymmetric dependency representation is well-known. UD makes no attempt to disguise this, and adopts first conjuncts, by convention, as the heads of coordinated phrases. Any other conjuncts and conjunctions are attached to that first conjunct.

This creates some ambiguities: it is not possible to tell, from the representation alone, whether elated modified bride only, or bride and groom. Conversely, it is clear that songs is an object only of sang, since it attaches to that verb directly rather than to the head of the conjunction, which is danced. A change in the ordering of these constituents can introduce that ambiguity.

Another (much less frequent) difficulty is the representation of nested coordinations, which is not always possible. In the following example, the heterogeneous coordination of incarcerated, on probation and on parole forms a complex predicate for the first verbal phrase in this sentence. That first VP is then itself coordinated with once were in one of those categories. The fact that there are two levels of coordination does not come through in the UD representation.

The auxiliaries have and be occasionally appear outside of coordinated predicates having a different function with respect to each predicate, as shown below. In such cases, we annotate the verb only as a dependent of the first conjunct.

In this sentence, was is also a cop dependent of rough, but that edge is not represented.

Conjunctions

Loose joining: parataxis and list

Special annotation conventions

Dates, times, addresses

Contact information

Itemizations

Specific constructions

Unpronounced material

While auxiliaries are normally not analyzed as being heads, when a verb has been elided from VP ellipsis, the auxiliary inherits the head-status. This includes the to nonfinite auxiliary.

Similarly, when a preposition is stranded in a passive construction, the preposition receives the nmod label on account of lacking a nominal head.

Gapping / Stripping

In ‘gapping’ constructions, where the head of a clause has been elided but two arguments that contrast with arguments in the antecedent clause remain, and ‘stripping’ constructions, where the head of a clause has been elided but one contrasting argument and one polarity adverbial such as not or only remain, the remnant relation is used between the remaining constituents and the words they contrast with:

When an argument is ‘sprouted’–present in the second clause with no antecedent–it depends on the head of the antecedent clause.

Right-node raising

In right-node raising constructions where the head of the left conjunct has been elided under identity with the head of the second conjunct, the right conjunct undergoes “promotion by head elision”, and gains the label that would be assigned to the head if it had been present.

Marginal disfluencies

In informal language usage, nonstandard constructions and disfluencies sometimes arise. When this involves a gapping-like construction–with one or more contrasting arguments that depend on an absent head–the remnant relation should be used.

If, however, the second clause is largely unparallel to the first clause, a different relation should be used (conj or parataxis if the smaller clause is not obviously modifying the larger one, acl or advcl otherwise)

Resultatives and depictives

Resultatives

Resultatives–predicate arguments of verbs that indicate how another argument of the verb has changed–are considered to be arguments, and therefore receive the xcomp relation instead of a modifier one.

Depictives

[<!> May be subject to change]

Depictives are generally subject-less modifiers of predicates–consequently, they should be analyzed using the advmod relation.

Tough-constructions

Clauses with expletives

In constructions without any dislocation, of the form it is adj to pred, the it is an expl, meaning that the lower predicate must be a csubj.

This construction can optionally occur with for and a subject; in this case, there are two possible analyses. If the subject is interpreted as experiencing the adjective predicate in some way, then it analysed as an nmod on the higher predicate; otherwise, it is analyzed as being exclusively the subject of the lower clause, and the for is analyzed as being a mark.

Fronting in tough-constructions

When the subject is not an argument of the higher clause, then the lower clause can displace the expletive.

When the subject is an argument of the higher clause, the lower verb phrase (in its gerund form) or its object (in its nominative form) can be fronted, displacing the expletive and maintaining its csubj label. In the latter case, the clause is no longer a csubj, being instead analyzed as an xcomp.

Dependency-introducing Adverbs

Comparatives

Canonical comparatives are introduced using a comparative adverb (such as more, less, or as) depending on an adjective, and either a clause or prepositional phrase marked with than, which also depends on the adjective. In the clausal case, this normally means that the comparing clause is headed by an auxiliary or copula that has been “promoted by head elision”.

In many cases, the initial comparative adverb has been dropped or incorporated into the adjective.

When the quantity of a noun is being compared, the same rules apply. Normally modifiers of nouns are deemed amods, but in this construction the comparative marker is an advmod in all cases.

More than and less than–when not used synonymously with over and under in quantity expressions–complicate matters slightly, since the comparative adverb is being used without the head that it modifies. We use a “promotion by head elision” solution, making the dependent into the head when the head is absent.

When predicates are compared to predicates or modifiers are compared to modifiers, the comparing phrase is always labeled as an advcl.

When a noun phrase is used to restrict the meaning of a comparative, it gets the npmod dependency label.

The more, the merrier

In English there exists a very peculiar correlative construction exemplified in the sentences the more, the merrier and the faster, the better. Even though both parts of the construction seem equal, suggesting a paratactic relationship between them, it is possible to have the second half be a standard finite clause while the first half remains unchanged, suggesting that the first is actually an adverbial clause depending on the second. For example, the sentence The angrier he became, the funnier it got can be rephrased as It got funnier the angrier he became, suggesting the following structure:

The word the in this construction is not serving its usual purpose as definite article (and in fact, historically the construction required it to be in the instrumental case, rather than in a case dictated by the grammatical function of the word it modified), so instead of labeling it det we choose to label it mark.

The comparative morpheme or adjective can be followed by a clause as well, such as “the more people that show up, the merrier the party will be”. Because the word that can intervene between the comparative word, the strcuture seems most consistent with a relative clause depending on the comparative, so we analyze it as such.

The sentence so far, so good should receive the same kind of analysis.

Similar constructions

A non-exhaustive list of constructions with analyses very similar to the analysis of standard comparatives.

X enough to/that…

So many… that…

Too X to…

Such… that…

Free relatives

Basic analysis

In the canonical case, wh-clauses function as interrogative clauses or as adverbial clauses. In these cases, the head of the wh-clause is taken to be the verb, and the wh-word is assigned the label corresponding to its grammatical function in the wh-clause:

In free relative constructions, the wh-clause functions as an argument in the higher clause. In these cases, the wh-phrase is deemed the head of the construction, thereby receiving a dependency relation reflective of its function in the higher clause, and the rest of the wh-clause is an acl:relcl dependent on it.

This analysis is also extended to cleft constructions.

The phrase no matter is analyzed as taking a dobj complement in, e.g., no matter the cost. When it takes free relative object, that object is also analyzed according to the rules above.

Cyclic cases

In some cases, the wh-phrase would be analyzed as the head of the wh-clause. For example, in the sentence I love how appreciative everyone was, the word appreciative would normally be a predicative head (since the verb was is a copula and would receive the cop relation). Since appreciative cannot be an acl:relcl dependent on itself, the auxiliary is promoted to the head of the relative clause and assigned the acl:relcl relation.

BESbswyBESbswyBESbswyBESbswy