home edit page issue tracker

This page pertains to UD version 2.

Function words in UD v2

One of the central design decisions of UD is to put priority on syntactic relations between content words and to treat function words essentially as “features” that attach to the content word they modify with special relations like u-dep/aux, u-dep/cop, u-dep/mark and u-dep/case. We propose the following changes to the treatment of function words in v2:

Classifiers

A classifier is a word which accompanies a noun in certain grammatical contexts, and generally reflects some kind of conceptual classification of nouns, based principally on features of their referents. Here are some examples from Mandarin Chinese:

Syntactically, the classifier groups with the numeral, rather than the noun, and the proposal for v2 is to treat classifiers as functional dependents of numerals (or possessives) using the new relation u-dep/clf:

sān gè xuéshēng \n three clf student
nummod(xuéshēng, sān)
clf(sān, gè)

Nonverbal auxiliaries

The v1 guidelines said that the u-dep/aux relation is reserved for auxiliary verbs. However, some languages (for example Bulgarian, see example) use particles to construct periphrastic verb forms, hence we should also allow nonverbal particles. More generally, we should define u-dep/aux as a grammaticalized expression of TAME categories. (We propose a parallel extension of the part-of-speech tag u-pos/AUX; see part-of-speech tags.)

Като се прибереш, ще съм почистил къщата. \n When you return , will I.have cleaned the.house
aux(почистил, ще)
aux(почистил, съм)

Note that this does not necessarily mean that all non-verb aux dependents in the current data are correct. See this query for an instance. They should be revised and each language-specific documentation should clearly state which lemmas may occur as auxiliaries and what TAME categories they are used in. That also applies to verbs —- in some UD treebanks, the list of verbs that are attached as auxiliaries is very long and some of the verbs probably should not be aux.