home edit page issue tracker

This page pertains to UD version 2.

Guidelines Changes

This pages summarizes the history of notable changes to the universal annotation guidelines.

Significant changes are classified as:

(Note: Many minor clarifications are not listed.)

Changes in UDv2

Updates to UDv2 will NOT alter the inventory of basic top-level dependency relations, UPOS tags, etc. But some updates have been necessary to clarify how they should be applied to particular linguistic phenomena, as well as formal constraints to be enforced by validation.

  #   Date
Type Title
5 2022-May 2.10 AMENDMENT, VALIDATOR Multiple Subjects
4 2022-May 2.10 AMENDMENT Optional Depictives
3 2022-Feb 2.10 AMENDMENT Reported Speech
2 2022-Jan 2.10 AMENDMENT, VALIDATOR Typos and goeswith
1 2021-Dec 2.10 CLARIFICATION, VALIDATOR Deverbal Connectives

Multiple Subjects

In general, UD prohibits multiple subjects (i.e. a word may have at most one nsubj or csubj dependent), and enforcing this in validation is a useful way to catch errors. However, a clause may serve as the predicate in a copular construction (e.g. The problem is that we already paid), posing a problem for this constraint. Until now, the guidelines carved out an exception for such cases: the copula of the outer clause would be promoted to head its subject and the predicate of the inner clause would attach to it as ccomp (as explained in v1 guidelines). But this yielded an odd interpretation of some copulas as transitive and offered no solution for zero copula constructions. A change was necessary.

The new policy—a product of extensive deliberation—is that the predicate of the inner clause can have multiple subject dependents. The subject(s) of the non-innermost clause(s) can be subtyped with :outer to signify nesting: nsubj:outer, csubj:outer. The :outer subtype, like all subtypes, is (at least for now) technically optional. Therefore, as an alternative, the validator will allow a treebank’s maintainers to manually verify that any instances of multiple subjects are correct.

Note that using :outer just for subjects does not fully disambiguate the compositional structure: for example, cop, aux, mark, advmod, and obl dependents of the predicate may belong to either the inner or outer clause. In order to avoid a proliferation of subtyped relations, the trees in the new guidelines do not apply the :outer label to anything other than subjects. Treebanks are, of course, welcome to innovate in their use of subtypes and/or MISC attributes.

Optional Depictives

Reanalyzed optional depictives as adverbial (advcl) rather than adnominal (acl), given that the predicand may not always be overt in the sentence, and even when it is overt it doesn’t form a nominal phrase with the depictive. The secondary predication can instead be expressed via an enhanced dependency, similar to control. (A precise naming recommendation for the enhanced edge is deferred for further discussion.)

Reported Speech

Revised the policy regarding reported speech: the quoted material attaches as ccomp to the speech verb regardless of order and punctuation; parataxis should be used only if the quotation is interrupted.

Typos and goeswith

Updates to the policy on typos to clarify treatment of goeswith:

Deverbal Connectives

Deverbal connectives may be tagged as VERB while attaching as case or mark. Documented at ADP.

UDv1 and transition to UDv2


Data Releases