Function words in UD v2
One of the central design decisions of UD is to put priority on syntactic relations between content words and to treat function words essentially as “features” that attach to the content word they modify with special relations like u-dep/aux, u-dep/cop, u-dep/mark and u-dep/case. We propose the following changes to the treatment of function words in v2:
- Add a new relation u-dep/clf for classifiers (see below)
- Allow u-dep/aux with nonverbal TAME particles (see below)
- Remove
auxpass
from the universal relations (see core dependents for discussion) - Limit u-dep/cop to pure linking words (whether verbal or nonverbal) (see copula for discussion)
Classifiers
A classifier is a word which accompanies a noun in certain grammatical contexts, and generally reflects some kind of conceptual classification of nouns, based principally on features of their referents. Here are some examples from Mandarin Chinese:
- 三个学生 (三個學生) sān gè xuéshēng = “three students”, literally “three [human-classifier] student”
- 三棵树 (三棵樹) sān kē shù = “three trees”, literally “three [tree-classifier] tree”
- 三只鸟 (三隻鳥) sān zhī niǎo = “three birds”, literally “three [bird-classifier] bird”
- 三条河 (三條河) sān tiáo hé = “three rivers”, literally “three [long-wavy-classifier] river”
Syntactically, the classifier groups with the numeral, rather than the noun, and the proposal for v2 is to treat classifiers as functional dependents of numerals (or possessives) using the new relation u-dep/clf:
sān gè xuéshēng \n three clf student
nummod(xuéshēng, sān)
clf(sān, gè)
Nonverbal auxiliaries
The v1 guidelines said that the u-dep/aux relation is reserved for auxiliary verbs. However, some languages (for example Bulgarian, see example) use particles to construct periphrastic verb forms, hence we should also allow nonverbal particles. More generally, we should define u-dep/aux as a grammaticalized expression of TAME categories. (We propose a parallel extension of the part-of-speech tag u-pos/AUX; see part-of-speech tags.)
Като се прибереш, ще съм почистил къщата. \n When you return , will I.have cleaned the.house
aux(почистил, ще)
aux(почистил, съм)
Note that this does not necessarily mean that all non-verb aux dependents in the current data are correct. See this query for an instance. They should be revised and each language-specific documentation should clearly state which lemmas may occur as auxiliaries and what TAME categories they are used in. That also applies to verbs —- in some UD treebanks, the list of verbs that are attached as auxiliaries is very long and some of the verbs probably should not be aux
.