This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home v2/v2 issue tracker

Function words in UD v2

One of the central design decisions of UD is to put priority on syntactic relations between content words and to treat function words essentially as “features” that attach to the content word they modify with special relations like u-dep/aux, u-dep/cop, u-dep/mark and u-dep/case. We propose the following changes to the treatment of function words in v2:

Nonverbal auxiliaries

The v1 guidelines say that the u-dep/aux relation is reserved for auxiliary verbs. However, some languages (for example Bulgarian, see example) use particles to construct periphrastic verb forms, hence we should also allow nonverbal particles. More generally, we should define u-dep/aux as a grammaticalized expression of tense-aspect-mood-voice-evidentiality categories. (We propose a parallel extension of the part-of-speech tag u-pos/AUX; see part-of-speech tags.)

Като се прибереш, ще съм почистил къщата. \n When you return , will I.have cleaned the.house
aux(почистил, ще)
aux(почистил, съм)

Note that this does not necessarily mean that all non-verb aux dependents in the current data are correct. See this query for an instance. They should be revised and each language-specific documentation should clearly state which lemmas may occur as auxiliaries and what TAMVE categories they are used in. That also applies to verbs —- in some UD treebanks, the list of verbs that are attached as auxiliaries is very long and some of the verbs probably should not be aux.

The use of case and mark with subordinate clauses

The u-dep/case relation is used for adpositions and other particles that perform a case-marking function (in a wide sense). For example:

he depends on her
nmod(depends, her)
case(her, on)

The u-dep/mark relation is used for subordinate conjunctions and other words that mark subordinate clauses (without having a referential function like relative pronouns). Thus:

he believes that it is raining
ccomp(believes, raining)
mark(raining, that)

The assumption in v1 is that if the head is a nominal, only u-dep/case can be used, and if the head is a clausal predicate, only u-dep/mark can be used. However, applying this rule strictly means missing important generalizations in languages that allow case-marking elements to attach to subordinate clauses. Consider the following Swedish examples:

han tror henne \n he believes her
dobj(tror, henne)
han litar på henne \n he trusts in her
nmod(litar, henne)
case(henne, på)
han tror att det regnar \n he believes that it is raining
ccomp(tror, regnar)
mark(regnar, att)
han litar på att det regnar \n he trusts in that it is raining
advcl(litar, regnar)
mark(regnar, att)
case(regnar, på)

The function of the preposition “på” is exactly the same in the second and fourth example, namely to convert a structure that can occur in a core role (u-dep/dobj, u-dep/ccomp) to a structure that occurs in an oblique role (u-dep/nmod, u-dep/advcl). This function is completely distinct from the function of “att” in the third and fourth example, namely to convert a main clause structure into a subordinate or embedded clause. If we insist that “på” must be labed u-dep/mark in the fourth example, because it attaches to a clausal predicate, then its function will be misrepresented.