Universal Dependencies
The following table lists the 37 universal syntactic relations used in UD v2. It is a revised version of the relations originally described in Universal Stanford Dependencies: A cross-linguistic typology (de Marneffe et al. 2014).
The upper part of the table follows the main organizing principles of the UD taxonomy:
- Rows correspond to functional categories in relation to the head:
- Core arguments of clausal predicates
- Non-core dependents of clausal predicates
- Dependents of nominals
- Columns correspond to structural categories of the dependent:
- Nominals
- Clauses
- Modifier words
- Function words
The lower part of the table lists relations that are not dependency relations in the narrow sense:
- Relations used to analyze coordination
- Relations used to analyze multiword expressions (MWE)
- Loose joining relations
- Special relations for ellipsis, disfluencies, and orthographic errors
- Special relations for clausal heads, punctuation and other relations
|
|
|
|
|
|||||||||||||
|
|
|
|||||||||||||||
|
|
|
|
|
|||||||||||||
|
|
|
|
|
|||||||||||||
|
|
|
|
|
|||||||||||||
|
|
|
|
|
acl
: clausal modifier of noun (adnominal clause)
acl
stands for finite and non-finite clauses that modify a nominal. The acl
relation
contrasts with the advcl relation, which is used for adverbial clauses
that modify a predicate. The head of the acl
relation is the noun
that is modified, and the dependent is the head of the clause that
modifies the noun.
the issues as he sees them
acl(issues, sees)
There are many online sites offering booking facilities .
acl(sites, offering)
I have a parakeet named cookie .
acl(parakeet, named)
A president certain that they are correct is dangerous .
acl(president, certain)
ccomp(certain, correct)
nsubj(dangerous, president)
I just want a simple way to get my discount .
acl(way, get)
Cette affaire à suivre \n This case to follow
acl(affaire, suivre)
A relative clause is an instance of acl
, characterized by finiteness and usually omission of
the modified noun in the embedded clause. Some languages use a language-particular subtype acl:relcl
for the traditional class of relative clauses.
I saw the man you love
acl:relcl(man, love)
Some languages allow finite clausal complements for nouns with
a subset of nouns like fact or report. These look roughly like relative clauses, but do not have any omitted role in the dependent clause. This is the class of “content clauses” in Huddleston and Pullum 2002). These are also analyzed as acl
.
the fact that nobody cares
acl(fact, cares)
This relation is no longer used for optional depictives: advcl should be used instead.
acl:relcl
: relative clause modifier
A relative clause modifier of a nominal is a clause that modifies the nominal,
whereas the nominal is coreferential with a constituent inside the relative
clause (here the constituent may be realized as a relative pronoun, another
relative word, or it may not be overtly realized at all). The acl:relcl
relation points from the head of the modified nominal to the head of the
relative clause.
Depending on language, it may be required that relative clauses are finite. For example, English non-finite clauses are traditionally not termed relative; therefore, the girl that was born today is a relative clause because it is finite, while the girl born today is non-finite (the participle is not accompanied by a finite auxiliary) and it uses the plain acl relation. In other languages however, the distinction between finite and non-finite clauses may not exist or may not be used as a criterion for relative clauses.
I saw the man you love
acl:relcl(man, love)
I saw the book which you bought
acl:relcl(book, bought)
advcl
: adverbial clause modifier
An adverbial clause modifier is a clause which modifies a verb or other predicate (adjective, etc.), as a modifier not as a core complement. This includes things such as a temporal clause, consequence, conditional clause, purpose clause, etc. The dependent must be clausal (or else it is an advmod) and the dependent is the main predicate of the clause.
The accident happened as night was falling
advcl(happened, falling)
If you know who did it, you should tell the teacher
advcl(tell, know)
He talked to him in order to secure the account
advcl(talked, secure)
He was upset when I talked to him
advcl(upset, talked)
They heard about you missing classes.
advcl(heard, missing)
With the kids in school , I have plenty of free time
advcl(have, school)
mark(school, With)
nsubj(school, kids)
case(school, in)
She entered the room while sad
advcl(entered, sad)
Modifying Nominal Predicates
An advcl
never modifies a nominal as such (then it would be acl instead) but it can modify a clausal predicate
that is realized as a nominal, with or without copula. One has to distinguish whether the modifier clause modifies the
whole predication of the matrix clause, or just the entity denoted by the nominal. Hence we have advcl
in
He is a teacher , although he no longer teaches .
advcl(teacher, teaches)
but acl:relcl in
He is a teacher whom the students really love .
acl:relcl(teacher, love)
Optional Depictives
This relation is also used for optional depictive adjectives, where the adjective is introduced in clause structure independently of the nominal it describes (contrast: acl if the adjective is an adnominal predicate). The depictive adjective is treated as an adverbial clause modifier of the higher clause. The adjective also provides a secondary predication, where the nominal predicand may or may not be overt; if it is overt, the secondary predication can be represented with an enhanced dependency. See xcomp for further discussion of resultatives and depictives.
She entered the room sad
advcl(entered, sad)
Sad describes the person entering the room, not the manner of entering—but is still taken to modify the verb. Note the similarity to the while sad example above. Omitting the nominal predicand she does not change the basic analysis:
Entering the room sad is not recommended
advcl(Entering, sad)
advcl:relcl
: adverbial relative clause modifier
This relation applies to a relative clause that modifies a clause (as opposed to typical relative clauses, which are adnominal and use acl:relcl).
For example, the antecedent is a clause in:
I tried to explain myself – which was a bad idea .
advcl:relcl(tried, idea)
nsubj(idea, which)
advmod
: adverbial modifier
An adverbial modifier of a word is a (non-clausal) adverb or adverbial phrase that serves to modify a predicate or a modifier word.
In some situations in some languages, a limited set of adverbs can also
modify nominals (e.g., only on Monday). The advmod
relation or
its subtype has to be used in such cases, too (see also advmod:emph).
Note that in some grammatical traditions, the term adverbial modifier covers
constituents that function like adverbs regardless whether they are realized
by adverbs, adpositional phrases, or nouns in particular morphological
cases.
We differentiate adverbials realized as adverbs (advmod) and
adverbials realized by noun phrases or adpositional phrases
(obl). However, we do not differentiate between modifiers of predicates
(adverbials in a narrow sense) and modifiers of other modifier words like
adjectives or adverbs (sometime called qualifiers). These functions are all
subsumed under advmod
.
Genetically modified food
advmod(modified, Genetically)
less often
advmod(often, less)
Where/ADV do/AUX you/PRON want/VERB to/ADP go/VERB later/ADV ?/PUNCT
advmod(go, Where)
advmod(go, later)
This is where/ADV I lived when/ADV I was born
nsubj(where, This)
cop(where, is)
advcl:relcl(where, lived)
advcl(lived, born)
advmod(born, when)
About 200 people came to the party
advmod(200, About)
advmod:emph
: emphasizing word, intensifier
This is a special class of adverbial modifiers.
It corresponds to the words that are attached in the analytical layer of PDT with the label AuxZ
.
In the tectogrammatical layer they often get the label (functor) RHEM
(rhematizers).
While other adverbial modifiers usually modify verbs, adjectives or adverbs, these emphasizers often modify noun phrases, including prepositional phrases.
zvlášť v pondělí \n especially on Monday
advmod:emph(pondělí, zvlášť)
advmod:emph(Monday, especially)
jen 15 procent \n only 15 percent
advmod:emph(procent, jen)
advmod:emph(percent, only)
Other examples:
- Mohli by obvinit i některého ministra. “They could prosecute also/even a minister.”
- Začnou až o měsíc později. lit. They-will-start even by month later. “They will start one month later.” (Až expresses that the speaker or the listener did not expect the thing to happen that late.)
- Ani vojáci o to nemají zájem. “Not even soldiers are interested in it.”
- Hraje už v sobotu. “He will play already on Saturday.”
- Chceme se sejít ještě tento týden. lit. We-want to meet still this week. “We want to meet before this week ends.”
- u asi 20 titulů “by around/approximately 20 items”
- Dá se to dokumentovat právě na početné skupině dětí. “It can be shown just on a large group of children.”
advmod:lmod
: locative adverbial modifier
A locative adverbial modifier is a subtype of the advmod relation: if the modifier is specifying a location, it is labeled as lmod
.
Danish:
Han bøjer sig ned . \n He bends himself down .
advmod:lmod(bøjer, ned)
amod
: adjectival modifier
An adjectival modifier of a noun (or pronoun) is any adjectival phrase that serves to modify the noun (or pronoun). The relation applies whether the meaning of the noun is modified in a compositional way (e.g., large house) or an idiomatic way (hot dogs).
An amod
dependent may have its own modifiers (e.g., very large house) but the dependent should not be a clause. If it is a clause, then acl
should be used.
Sam eats large hot dogs
amod(dogs, large)
amod(dogs, hot)
There is nothing wrong with it
amod(nothing, wrong)
appos
: appositional modifier
An appositional modifier of a noun is a nominal immediately following the first noun that serves to define, modify, name, or describe that noun. It includes parenthesized examples, as well as defining abbreviations in one of these structures.
Sam , my brother , arrived
appos(Sam-1, brother-4)
Bill ( John 's cousin )
appos(Bill-1, cousin-5)
The Australian Broadcasting Corporation ( ABC )
appos(Corporation-4, ABC-6)
appos
is intended to be used between two nominals. In general, modulo punctuation, the two halves of an apposition can be switched.
For example, you could also say My brother, Sam, arrived. There are somewhat similar constructions with titles
where the title is less than a full nominal, such as state senator Paul Mnuchin, where reversal is impossible
or would require insertion of a determiner to make a
full nominal. Some grammatical traditions, descending from Latin, call state senator in such cases a “fixed (or close) apposition” and take the name as the head. However, we seem to have only one nominal not two here. For example:
President Obama
*Obama President
state senator Paul Mnuchin
*Paul Mnuchin state senator
appos
should not be used in such cases. However, the examples can usually be rendered in a fuller form, corresponding to “loose (or wide) apposition” in the Latin tradition, where there are two full phrases. Then the relation appos
is appropriate, for example:
Paul Mnuchin , the senior Oregon state senator
appos(Mnuchin-2, senator-8)
As is often the case, there are borderline cases. In formal writing, punctuation is usually a good signal of apposition, but there are certainly cases of apposition where no punctuation is used:
the leader of the militant Lebanese Shiite group Hassan Nasrallah
appos(leader-2, Hassan-9)
flat(Hassan-9, Nasrallah-10)
Good tests include to ask whether the two halves are full nominals, whether the two halves can be swapped or not, and whether there is case or agreement concord (in a language with rich morphology). So we have:
I met the French actor Gaspard Ulliel
nsubj(met-2, I-1)
det(actor-5, the-3)
amod(actor-5, French-4)
obj(met-2, actor-5)
appos(actor-5, Gaspard-6)
flat(Gaspard-6, Ulliel-7)
I met Gaspard Ulliel the French actor
nsubj(met-2, I-1)
obj(met-2, Gaspard-3)
flat(Gaspard-3, Ulliel-4)
det(actor-7, the-5)
amod(actor-7, French-6)
appos(Gaspard-3, actor-7)
I met Gaspard Ulliel , the French actor
nsubj(met-2, I-1)
obj(met-2, Gaspard-3)
flat(Gaspard-3, Ulliel-4)
punct(Gaspard-3, ,-5)
det(actor-8, the-6)
amod(actor-8, French-7)
appos(Gaspard-3, actor-8)
I met French actor Gaspard Ulliel
nsubj(met-2, I-1)
amod(actor-4, French-3)
obj(met-2, actor-4)
flat(actor-4, Gaspard-5)
flat(actor-4, Ulliel-6)
While items like abbreviations are generally reversable, the determiner test suggested above doesn’t quite work there, since the determiner seems to belong with the main item:
The ABC ( Australian Broadcasting Corporation )
appos(ABC-2, Corporation-6)
In the rare cases of more than one appositive nominal, all nouns should be marked as modifying the first noun, rather than being chained:
Sam , my brother , John 's cousin , arrived
appos(Sam-1, brother-4)
appos(Sam-1, cousin-8)
Note however that nested apposition cannot be completely excluded. It may occur in combination with coordination:
You can choose between four subjects , language ( German or French ) , economy , technology and art .
appos(subjects, language)
conj(language, economy)
conj(language, technology)
conj(language, art)
cc(art, and)
appos(language, German)
conj(German, French)
cc(French, or)
appos
is also used to link key-value pairs in addresses, signature blocs, etc. (see also the list label):
Steve Jones Phone: 555-9814 Email: jones@abc.edf
flat:name(Steve-1, Jones-2)
list(Steve-1, Phone:-3)
list(Steve-1, Email:-5)
appos(Phone:-3, 555-9814-4)
appos(Email:-5, jones@abc.edf-6)
aux
: auxiliary
An aux
(auxiliary) of a clause is a function word associated with a verbal predicate that
expresses categories such as tense, mood, aspect, voice or evidentiality. It is often a verb
(which may have non-auxiliary uses as well) but many languages have nonverbal TAME markers and these
are also treated as instances of aux
.
New from v2: Auxiliares used to construct the passive voice are now also labeled aux
,
although we strongly encourage the use of the subtype aux:pass
in language that have a grammaticalized (periphrastic)
passive.
Reagan has died
aux(died-3, has-2)
He should leave
aux(leave-3, should-2)
Do you think that he will have left by the time we come ?
aux(think, Do)
aux(left, will)
aux(left, have)
aux:pass
: passive auxiliary
A passive auxiliary of a clause is a form of the auxiliary verb být “to be” used to construct the periphrastic passive voice (in any tense or in the infinitive).
Kennedy byl zabit . \n Kennedy was killed .
aux:pass(zabit, byl)
aux:pass(killed, was)
Kennedy bude zabit . \n Kennedy will-be killed .
aux:pass(zabit, bude)
aux:pass(killed, will-be)
Kennedy netušil , že jeho osudem je být zabit . \n Kennedy did-not-anticipate that his fate is to-be killed .
aux:pass(zabit, být)
aux:pass(killed, to-be)
Note that the passive participle may be also used as nominal predicate with copula. Hence it may be difficult to distinguish a passive construction from a copula construction. The former focuses on the process while the latter emphasizes the result.
- Passive:
Smlouva byla podepsána v Bílém domě . \n Contract was signed in White House .
aux:pass(podepsána, byla)
aux:pass(signed, was)
- Copula:
Smlouva byla podepsána červeným inkoustem . \n Contract was signed in-red ink .
cop(podepsána, byla)
cop(signed, was)
case
: case marking
The case
relation is used for any case-marking element which is treated as a separate syntactic word (including prepositions, postpositions, and clitic case markers). Case-marking elements are treated as dependents of the noun they attach to or introduce. (Thus, contrary to SD, UD abandons treating a preposition as a mediator between a modified word and its object.) The case
relation aims at providing a more uniform analysis of nominal elements, prepositions and case in morphologically rich languages: a nominal in an oblique case will receive the same dependency structure as a nominal introduced by an adposition.
the Chair 's office
det(Chair-2, the-1)
nmod(office-4, Chair-2)
case(Chair-2, 's-3)
the office of the Chair
det(office-2, the-1)
nmod(office-2, Chair-5)
case(Chair-5, of-3)
det(Chair-5, the-4)
French:
le bureau de le président \n the office of the Chair
det(bureau, le-1)
nmod(bureau, président)
case(président, de)
det(président, le-4)
Hebrew:
hwa/PRON rah/VERB at/PART[Case=Acc] h/DET klb/NOUN \n he saw ACC the dog
obj(rah-2, klb-5)
case(klb-5, at-3)
When case markers are morphemes, they are not divided off the noun as a separate case dependent, but the noun as a whole is analyzed as obl (if dependent on a predicate) or nmod (if dependent on noun). To overtly mark case, POS tags and features are included in the representation as shown below on a Russian example (put your mouse pointer over the words to see additional morphosyntactic features).
# I wrote the letter with a quill.
1 Я ja PRON _ Case=Nom|Number=Sing|Person=1|PronType=Prs 2 nsubj _ I
2 написал napisat' VERB _ Gender=Masc|Number=Sing|VerbForm=Part|Voice=Act 0 root _ wrote
3 письмо pis'mo NOUN _ Case=Acc|Gender=Neut|Number=Sing 2 obj _ the-letter
4 пером pero NOUN _ Case=Ins|Gender=Neut|Number=Sing 2 obl _ with-a-quill
This treatment provides parallelism between different constructions across and within languages. A good result is that we now have greater parallelism between prepositional phrases and subordinate clauses, which are often introduced by a preposition in some languages (but note that the relation should be mark in those cases):
Sue left after the rehearsal
nsubj(left-2, Sue-1)
obl(left-2, rehearsal-5)
det(rehearsal-5, the-4)
case(rehearsal-5, after-3)
Sue left after we did
nsubj(left-2, Sue-1)
advcl(left-2, did-5)
mark(did-5, after-3)
nsubj(did-5, we-4)
We also obtain parallel constructions for
- the possessive alternation
the Chair 's office
det(Chair-2, the-1)
nmod(office-4, Chair-2)
case(Chair-2, 's-3)
the office of the Chair
det(office-2, the-1)
nmod(office-2, Chair-5)
case(Chair-5, of-3)
det(Chair-5, the-4)
- variant forms with case, a preposition or a postposition in Finnish
etsiä ilman johtolankaa \n to_search without clue.PARTITIVE
obl(etsiä, johtolankaa)
case(johtolankaa, ilman)
etsiä taskulampun kanssa \n to_search torch.GENITIVE with
obl(etsiä, taskulampun)
case(taskulampun, kanssa)
etsiä johtolangatta \n to_search clue.ABESSIVE
obl(etsiä, johtolangatta)
- the dative alternation where the prepositional construction gets a similar analysis to the double object construction
give the children the toys
obj(give, toys)
iobj(give, children)
give the toys to the children
obj(give, toys)
obl(give, children)
case(children, to)
# give the toys to the children
1 donner donner VERB _ VerbForm=Inf 0 root _ give
2 les le DET _ Definite=Def|Number=Plur 3 det _ the
3 jouets jouet NOUN _ Gender=Masc|Number=Plur 1 obj _ toys
4-5 aux _ _ _ _ _ _ _ _
4 à à ADP _ _ 6 case _ to
5 les le DET _ Definite=Def|Number=Plur 6 det _ the
6 enfants enfant NOUN _ Gender=Masc|Number=Plur 1 obl _ children
Another advantage of this new analysis is that it provides a treatment of prepositional phrases that are predicative complements of “be” that is consistent with the treatment of nominal predicative complements:
Sue is in shape
nsubj(shape-4, Sue-1)
cop(shape-4, is-2)
case(shape-4, in-3)
When prepositions are stacked (that is, there is a sequence of prepositions), there are two possible analyses. If the sequence is a frozen combination with a specific meaning, then the best analysis is as fixed
. An English example of this is out of:
Out of all this , something good will come .
case(this-4, Out-1)
fixed(Out-1, of-2)
det(this-4, all-3)
obl(come, this-4)
However, if various combinations of prepositions can be used to express different meaning combinations or nuances, then each preposition is independently analyzed as a case dependent. Examples of this in English include up beside (which can alternate with down beside or up near) or except during which can alternate with as during or except after:
The cafe up beside the lookout
det(cafe-2, The-1)
case(lookout-6, up-3)
case(lookout-6, beside-4)
det(lookout-6, the-5)
nmod(cafe-2, lookout-6)
cc
: coordinating conjunction
A cc
is the relation between a conjunct and
an associated coordinating conjunction.
Bill is big and honest
conj(big, honest)
cc(honest, and)
A coordinating conjunction may also appear at the beginning of a
sentence. This is also attached as cc
, even though the sentence lacks
multiple conjuncts joined with a conj relation.
And then we left .
cc(left, And)
cc:preconj
: preconjunct
A preconjunct is the relation between the head of coordination and the word that appears at the beginning of the coordination (which could be seen as the first part of a multi-word coordinating conjunction). English examples include either … or, neither … nor, both … and.
Both the boys and the girls are here
cc:preconj(boys, Both)
ccomp
: clausal complement
A clausal complement of a verb or adjective is a dependent clause which is a core argument. That is, it functions like an object of the verb, or adjective.
He says that you like to swim
ccomp(says, like)
mark(like, that)
He says you like to swim
ccomp(says, like)
Such clausal complements may be finite or nonfinite. However, if the subject of the clausal complement is controlled (that is, must be the same as the higher subject or object, with no other possible interpretation) the appropriate relation is xcomp.
The boss said to start digging
ccomp(said, start)
mark(start, to)
We started digging
xcomp(started, digging)
The key difference here is that, while it is possible to interpret the first
sentence to mean that the boss will not be doing any digging, in the second
sentence it is clear that the subject of digging can only be we. This is
what distinguishes ccomp
and xcomp
.
Adjectives may also license ccomp
:
I was afraid/ADJ that this would happen
ccomp(afraid, happen)
Reported Speech
With a speech verb like say, the content of reported speech is considered to be part of the verb’s valency. It therefore attaches as ccomp—not only when integrated within the clause as an indirect quotation (said that…), but also when set off as a direct quotation, even with inverted order:
He said that he knew the muffin man .
ccomp(said, knew)
I asked : " Do you know the muffin man ? "
ccomp(asked, know)
" Do you know the muffin man ? " I asked .
ccomp(asked, know)
" I had hoped to remain anonymous , " said the muffin man , who was tracked down Sunday at his home on Drury Lane .
ccomp(said, hoped)
nsubj(said, man)
Quoted content is considered to be ccomp even if it is a sentence fragment:
" Three/NUM muffins/NOUN , " he answered .
nummod(muffins, Three)
ccomp(answered, muffins)
If the speech verb interrupts the reported speech content, parataxis is used instead. The speech verb attaches to the root of the reported speech (all in the following example):
" Three muffins , " he answered , " are all that I need today . "
parataxis(all, answered)
nsubj(all, muffins)
Weapons of mass destruction , the report explained , are designed to target civilian populations .
parataxis(designed, explained)
nsubj:pass(designed, Weapons)
the impact that the group 's practices , law enforcement officials say , are having on the most vulnerable within the sect
acl:relcl(impact, having)
nsubj(having, practices)
parataxis(having, say)
Changed:
- In earlier versions of SD/USD, complement clauses with nouns like fact or report were also analyzed as
ccomp
. However, we now analyze them as acl. Hence,ccomp
does not appear in nominals. This makes sense, since nominals normally do not take core arguments. - The policy for copular constructions with a full clause as predicate has been changed to no longer use
ccomp
to nest the predicate clause under the copula.
clf
: classifier
A clf
(classifier) is a word which accompanies a noun in certain grammatical contexts.
The most canonical use is numeral classifiers, where the word is used with a number for counting objects.
A classifier generally reflects some kind of
conceptual classification of nouns, based principally on features of their referents.
Etymologically, classifiers are normally historically nouns, and the words may still also be used as independent nouns,
but in their classifier use they have scant semantics left.
In most cases, the most appropriate UPOS to give classifiers will still be NOUN, though you may wish to give the words a feature
indicating their special status as a classifier. (There is at present no Universal feature for classifiers, but NounType=Clf
might be apt.)
The clf
function is intended for languages which have highly grammaticalized systems of classifiers.
The greatest density of such languages is in Asia.
As well as core classifiers, there are often also other words, sometimes called “massifiers” that are used in counting with
similar behavior to classifiers. These typically include words for containers (“cup”, “box”) and units (“month”, “inch”),
such as Chinese 袋 ‘bag’ in 一袋米 [one bag rice] ‘a bag of rice’.
In a classifier language, it is usually most appropriate to also analyze these words as classifiers.
Most other languages also count things with units, however, for these languages, such as English, clf
is not used and rather
standard noun phrase relations are still used (despite there also being incipient grammaticalization in many cases, including English).
See the examples for English at the end.
Here are some examples from Mandarin/Putonghua Chinese:
- 三个学生 (三個學生) sān gè xuéshēng = “three students”, literally “three [human-classifier] student”
- 三棵树 (三棵樹) sān kē shù = “three trees”, literally “three [tree-classifier] tree”
- 三只鸟 (三隻鳥) sān zhī niǎo = “three birds”, literally “three [bird-classifier] bird”
- 三条河 (三條河) sān tiáo hé = “three rivers”, literally “three [long-wavy-classifier] river”
Analogous examples from Thai:
- นักเรียนสามคน nâkríán sám gʰn = “three students”, literally “student three [human-classifier]”
- ต้นไม้สามต้น t²nmai² sám t²n = “three trees”, literally “tree three [tree-classifier]”
- นกสามตัว nk sám túá = “three birds”, literally “bird three [animal-classifier]”
- แม่น้ำสามสาย mǽ¹nã² sám sáy = “three rivers”, literally “river three [river-classifier]”
Syntactically, the classifier groups with the numeral rather than the noun and we therefore treat
classifiers as functional dependents of numerals (or possessives) using the new clf
relation. (This
is one of Greenberg’s universals and is true in almost all cases.
A couple of exceptions are noted in Aikhenvald (2000: 105) Classifiers, OUP, but it is noticeable that in those languages
the putative head noun is in the genitive case.)
三/NUM 个/NOUN 学生/NOUN \n sān gè xuéshēng \n three CLF student
nummod(学生, 三)
clf(三, 个)
nummod(xuéshēng, sān)
clf(sān, gè)
nummod(student, three)
clf(three, CLF)
แมว/NOUN สาม/NUM ตัว/NOUN \n mǽw sám túá \n cat three CLF
nummod(แมว, สาม)
clf(สาม, ตัว)
nummod(mǽw, sám)
clf(sám, túá)
nummod(cat, three)
clf(three, CLF)
Sometimes a classifier is inserted between a demonstrative and a noun (instead of numeral and noun) [zh]:
乘坐 這 輛 巴士 \n Chéngzuò zhè liàng bāshì \n Take this CLF bus
obj(乘坐, 巴士)
det(巴士, 這)
clf(這, 輛)
obj(Chéngzuò, bāshì)
det(bāshì, zhè)
clf(zhè, liàng)
obj(Take, bus)
det(bus, this)
clf(this, CLF)
Classifier words also occur in various other constructions, and so it is important to distinguish the word in a particular language from the universal classifier function proposed in UD. We go here through some further examples with Chinese classifiers.
The number and classifier may appear without the counted noun. In this case, the classifier takes the role of the missing noun, and we promote the classifier to be the head. So 我 買 兩 本 “I am buying two” is regarded as “I am buying two [books-CLF]”.
我 買 兩 本 \n I buy two CLF
obj(買, 本)
nummod(本, 兩)
In some languages, including Chinese, a classifier can also appear without a number, and frequently then has some sort of
determinative function. We use the relation det
for such uses of a classifier. For instance, in Cantonese ‘She bought a/the book’:
佢 買 咗 本 書 \n keoi maai zo bun syu \n 3sg buy PERF CLF book
obj(買, 書)
det(書, 本)
For languages without highly grammaticalized classifier systems, standard nominal modification relationships are used even when things are being counted in groups (with “massifiers”). For example, in English:
three cups of rolled oats
nummod(cups, three)
case(oats, of)
amod(oats, rolled)
nmod(cups, oats)
three cups rolled oats
nummod(cups, three)
amod(oats, rolled)
nmod(cups, oats)
compound
: compound
The compound
relation is used to analyze compounds, that is, combinations of lexemes that morphosyntactically behave as single words. Commonly occurring cases are:
- Nominal compounds written as separate words, for example English apple juice.
- Particle verbs where the particle is realized as a separate word (which may alternate with affixed particles), for example Swedish byta ut (‘exchange’; cf. utbytt, ‘exchanged’). The subtype compound:prt is commonly used in this case.
- Serial verbs, for which the subtype compound:svc is commonly used, as in this Nupe example (Tallerman 2014):
Musa bé lá èbi \n Musa came took knife \n Musa came to take the knife
nsubj(bé, Musa)
compound:svc(bé, lá)
obj(bé, èbi)
Each language that uses compound
should develop its own specific criteria based on morphosyntax (rather than lexicalization or semantic idiomaticity), though elsewhere the term “compound” may be used more broadly.
See also:
English Examples
phone book
compound(book, phone)
ice cream flavors
compound(cream, ice)
compound(flavors, cream)
Sam took out a 3 million dollar loan
compound(loan, dollar)
Sam took out a $ 3 million loan
compound(loan, $)
put up
compound:prt(put, up)
Not compound
Just because an expression is lexicalized or idiomatic does not mean compound
applies.
In English, adjective-noun combinations, prepositional phrases, and light verb constructions are better described with other relations:
hot dog
amod(dog, hot)
the state of play
det(state, the)
nmod(state, play)
case(play, of)
make a decision
obj(make, decision)
det(decision, a)
compound:lvc
: light verb construction
This subtype of compound covers light verbs. In a light-verb construction the verb does not have much semantic content. The semantics of the construction are determined by the non-head word, often a noun or adjective.
Onlar treni tercih ediyor . \n They prefer the train .
compound:lvc(ediyor, tercih)
obj(ediyor, treni)
subj(ediyor, Onlar)
Most common verbs that act like as a light verb is et-. However, many other are possible.
Yıllarca çile çektiler . \n They suffered for years .
compound:lvc(çektiler, çile)
Although the semantically loaded component of a light-verb construction is generally an adjective or a noun, it is common to observe verbs in this position particularly in code-switching settings.
Partiyi cancel ettik . \n We canceled the party
compound:lvc(ettik, cancel)
compound:prt
: phrasal verb particle
The phrasal verb particle relation identifies an idiomatic phrasal verb, and holds between the verb and its particle (tagged as ADP). It is a subtype of the compound relation.
They shut down the station
compound:prt(shut, down)
They shut the station down
compound:prt(shut, down)
This relation excludes literal/directional uses of prepositions/particles, such as up, down, in, out, etc. These would typically become an ADV with the relation advmod:
The house was on fire and they ran out screaming.
advmod(ran, out)
compound:redup
: reduplicated compounds
This subtype of compound covers a range of reduplicated forms in Turkish. Reduplication is a common process especially for adverbs and adjectives. Except for m-reduplication (see below), the head is the last word.
The reduplication typically involves two identical words, but some morpho-phonological alternations (as in m-reduplication in example 3 below) are possible.
Koca koca adamlar oyun oynuyorlar . \n _Big (+emph)_ men are playing games .
compound:redup(koca-2, Koca-1)
Açık açık söylüyorum . \n I am telling it _clearly_
compound:redup(açık-2, Açık-1)
Araba maraba almışlar . \n They bought (a) car (and things like that)
compound:redup(Araba, maraba)
For lexicalized multi-word items with repetition where one or more of the words are not free lexemes, (e.g. paldır küldür, ufak tefek), we use fixed.
compound:svc
: serial verb compounds
The relation compound:svc
is used for serial verb constructions. In this type of construction, several verbs are combined to describe the same action.
# visual-style 2 4 compound:svc color:blue
# visual-style 4 bgColor:blue
# visual-style 4 fgColor:white
# visual-style 2 bgColor:blue
# visual-style 2 fgColor:white
1 dem them PRON PRON _ 2 nsubj _
2 enter enter VERB VERB _ 0 root _ _
3 bus bus NOUN NOUN _ 2 obj _ _
4 go go VERB VERB _ 2 compound:svc _ _
5 work work NOUN NOUN _ 4 obj _ _
1 they _ _ _ _ 0 _ _ _
2 enter _ _ _ _ 0 _ _ _
3 bus _ _ _ _ 0 _ _ _
4 go _ _ _ _ 0 _ _ _
5 work _ _ _ _ 0 _ _ _
1 They _ _ _ _ 0 _ _ _
2 take _ _ _ _ 0 _ _ _
3 the _ _ _ _ 0 _ _ _
4 bus _ _ _ _ 0 _ _ _
5 to _ _ _ _ 0 _ _ _
6 work _ _ _ _ 0 _ _ _
The verbs in a serial verb construction share the same subject but not necessarily the same object.
# visual-style 4 7 compound:svc color:blue
# visual-style 4 bgColor:blue
# visual-style 4 fgColor:white
# visual-style 7 bgColor:blue
# visual-style 7 fgColor:white
# visual-style 13 15 compound:svc color:blue
# visual-style 13 bgColor:blue
# visual-style 13 fgColor:white
# visual-style 15 bgColor:blue
# visual-style 15 fgColor:white
1 so so ADV SCONJ _ 4 advmod _ _
2 we we PRON PRON _ 4 nsubj _ _
3 don don AUX AUX _ 4 aux _ _
4 carry carry VERB VERB _ 0 root _ _
5 di the DET DET _ 6 det _ _
6 matter matter NOUN NOUN _ 4 obj _ _
7 come come VERB VERB _ 4 compound:svc _ _
8 again again ADV ADV _ 7 advmod _ _
9 as as SCONJ ADP _ 13 mark _ _
10 we we PRON PRON _ 13 nsubj _ _
11 dey be AUX AUX _ 13 aux _ _
12 always always ADV ADV _ 13 advmod _ _
13 carry carry VERB VERB _ 7 advcl _ _
14 am he PRON PRON _ 13 obj _ _
15 come come VERB VERB _ 13 compound:svc _ _
1 so _ _ _ _ 0 _ _ _
2 we _ _ _ _ 0 _ _ _
3 have _ _ _ _ 0 _ _ _
4 carry _ _ _ _ 0 _ _ _
5 the _ _ _ _ 0 _ _ _
6 matter _ _ _ _ 0 _ _ _
7 come _ _ _ _ 0 _ _ _
8 again _ _ _ _ 0 _ _ _
9 as _ _ _ _ 0 _ _ _
10 we _ _ _ _ 0 _ _ _
11 be _ _ _ _ 0 _ _ _
12 always _ _ _ _ 0 _ _ _
13 carry _ _ _ _ 0 _ _ _
14 it _ _ _ _ 0 _ _ _
15 come _ _ _ _ 0 _ _ _
1 so _ _ _ _ 0 _ _ _
2 we _ _ _ _ 0 _ _ _
3 have _ _ _ _ 0 _ _ _
4 brought _ _ _ _ 0 _ _ _
5 the _ _ _ _ 0 _ _ _
6 issue _ _ _ _ 0 _ _ _
7 again _ _ _ _ 0 _ _ _
8 as _ _ _ _ 0 _ _ _
9 we _ _ _ _ 0 _ _ _
10 always _ _ _ _ 0 _ _ _
11 do _ _ _ _ 0 _ _ _
An adjective may be used in place of a verb in a serial verb construction.
# visual-style 3 4 compound:svc color:blue
# visual-style 4 bgColor:blue
# visual-style 4 fgColor:white
# visual-style 3 bgColor:blue
# visual-style 3 fgColor:white
1 di the DET DET _ 2 det _ _
2 guy guy NOUN NOUN _ 3 nsubj _ _
3 fine fine ADJ ADJ _ 0 root _ _
4 reach arrive VERB VERB _ 3 compound:svc _ _
5 me I PRON PRON _ 4 obj _ _
1 the _ _ _ _ 0 _ _ _
2 guy _ _ _ _ 0 _ _ _
3 fine _ _ _ _ 0 _ _ _
4 reach _ _ _ _ 0 _ _ _
5 me _ _ _ _ 0 _ _ _
1 Is _ _ _ _ 0 _ _ _
2 the _ _ _ _ 0 _ _ _
3 guy _ _ _ _ 0 _ _ _
4 as _ _ _ _ 0 _ _ _
5 handsome _ _ _ _ 0 _ _ _
6 as _ _ _ _ 0 _ _ _
7 I _ _ _ _ 0 _ _ _
8 am _ _ _ _ 0 _ _ _
Comparatives
In Naija serial verbs constructions are also used for comparatives. In these constructions the adjective which is being used to draw the comparison is followed by the verb pass.
# visual-style 2 3 compound:svc color:blue
# visual-style 3 bgColor:blue
# visual-style 3 fgColor:white
# visual-style 2 bgColor:blue
# visual-style 2 fgColor:white
1 farmer farmer NOUN NOUN _ 2 nsubj _ _
2 happy happy ADJ ADJ _ 0 root _ _
3 pass pass VERB VERB _ 2 compound:svc
4 when when ADV ADV _ 6 mark _ _
5 rain rain NOUN NOUN _ 6 nsubj _ _
6 fall fall VERB VERB _ 2 advcl _ _
7 like like ADP ADP _ 8 case _ _
8 dis this DET DET _ 6 obl _ _
1 farmers _ _ _ _ 0 _ _ _
2 happy _ _ _ _ 0 _ _ _
3 exceed _ _ _ _ 0 _ _ _
4 when _ _ _ _ 0 _ _ _
5 rain _ _ _ _ 0 _ _ _
6 fall _ _ _ _ 0 _ _ _
7 like _ _ _ _ 0 _ _ _
8 this _ _ _ _ 0 _ _ _
1 Farmers _ _ _ _ 0 _ _ _
2 become _ _ _ _ 0 _ _ _
3 happier _ _ _ _ 0 _ _ _
4 when _ _ _ _ 0 _ _ _
5 rain _ _ _ _ 0 _ _ _
6 falls _ _ _ _ 0 _ _ _
7 like _ _ _ _ 0 _ _ _
8 this _ _ _ _ 0 _ _ _
conj
: conjunct
A conjunct is the relation between two elements connected by a
coordinating conjunction, such as and, or, etc. Coordinate structures
are in principle symmetrical, but the first conjunction is by convention
treated as the parent (or “technical head”) of all subsequent coordinated clauses
via the conj
relation.
Bill is big and honest
conj(big, honest)
Coordinated clauses are treated the same way as coordination of other constituent types:
He came home , took a shower and immediately went to bed .
conj(came, took)
conj(came, went)
punct(took, ,-4)
cc(went, and)
Coordination may be asyndetic, which means that the coordinating conjunction is omitted. Commas or other punctuation symbols will delimit the conjuncts in the typical case. Asyndetic coordination may be more frequent in some languages, while in others, conjunction will appear between every two conjuncts (John and Mary and Bill).
Veni , vidi , vici .
conj(Veni, vidi)
conj(Veni, vici)
punct(vidi, ,-2)
punct(vici, ,-4)
Shared Dependents and Effective Parents in Coordination
Note that the current basic annotation scheme cannot distinguish between a dependent of the first conjunct and a shared dependent of the whole coordination:
He met her at the station and kissed her .
conj(met, kissed)
nsubj(met, He)
vs.
He met her at the station and she kissed him .
conj(met, kissed)
nsubj(met, He)
nsubj(kissed, she)
In contrast, the additional dependencies in the enhanced representation can be used to encode the fact that in the first case, he is also subject of kissed:
He met her at the station and kissed her .
conj(met, kissed)
nsubj(met, He)
nsubj(kissed, He)
Furthermore, the enhanced representation can also capture the relation of each conjunct to the parent of the coordination. Nevertheless, the effective parents can be found algorithmically and showing them explicitly is for convenience only, while the information about shared dependents is otherwise not available.
I saw that he met her at the station and kissed her .
conj(met, kissed)
nsubj(met, he)
nsubj(kissed, he)
ccomp(saw, met)
ccomp(saw, kissed)
If a dependent is shared among conjuncts, the basic representation always links it to the first conjunct (coordination head), while the enhanced representation shows all dependencies. In the following example, relations that are only part of the enhanced representation are shown in red.
# visual-style 6 1 amod color:red
# visual-style 4 3 amod color:red
# visual-style 6 3 amod color:red
1 American _ _ _ _ 4 amod 6:amod _
2 and _ _ _ _ 3 cc _ _
3 British _ _ _ _ 1 conj 4:amod|6:amod _
4 professors _ _ _ _ 0 root _ _
5 and _ _ _ _ 6 cc _ _
6 students _ _ _ _ 4 conj 0:root _
Nested Coordination
Note further that the basic annotation scheme has only a limited capability to capture nested coordination such as apples and pears or oranges and lemons. Consider coordinations
- A, B, C
- (A, B), C
- A, (B, C)
The first two cases, i.e., (A, B, C) and ((A, B), C), lead to the same tree:
A B C
conj(A, B)
conj(A, C)
Only the right-nesting case (A, (B, C)) can be distinguished because its tree is different:
A B C
conj(B, C)
conj(A, B)
Etc.
The item etc., used as a set-expander—especially in coordinations after at least two other items,
and typically not preceded by a conjunction (though and etc. is attested in English)—is treated
as a NOUN and final conjunct. Its distribution is, however, atypical of nouns in that it is
restricted to enumeration contexts, does not permit modification except by reduplication, and may
be post-coordinated with things that are not nominals. Note that this guideline applies to English
and other languages that borrowed the string etc. from Latin. The situation may be different in
languages that have their own equivalent of etc. For example, German usw. (und so weiter) and
Czech atd. (a tak dále), both meaning literally “and so further”, are ADV rather than NOUN
,
because their main element is an adverb; yet they are still attached as conj to the head of
the preceding list or coordination.
cop
: copula
A cop
(copula) is the relation of a function word used to link a subject to a nonverbal predicate, including the expression of identity predication (e.g. sentences like “Kim is the President”).
It is often a verb but nonverbal (pronominal) copulas are also frequent in the world’s languages.
Verbal copulas are tagged AUX, not VERB
. Pronominal copulas are tagged PRON or DET.
The cop
relation
should only be used for pure copulas that add at most TAME categories to the meaning of the predicate,
which means that most languages have at most one copula, and only when the nonverbal predicate is treated
as the head of the clause.
As a concrete example, in many European languages the equivalent of the English verb to be is the only word that can appear with the cop
relation. In Spanish and related languages, both ser and estar can be copulas. In Czech and related languages, both být and bývat are copulas (because they are morphological variants of the same lexeme, and the reason they have two lemmas is that aspect-related morphology is treated as derivational in these languages). In contrast, the equivalents of to become are not copulas despite the fact that traditional grammar may label them as such. Existential to be can be copula only if it is the same verb as in equivalence clauses (John is a teacher). If a language uses two different verbs, then the existential one is not a copula. Some more discussion of the topic is archived here.
Bill is honest
nsubj(honest, Bill)
cop(honest, is)
Ivan is the best dancer
nsubj(dancer-5, Ivan-1)
cop(dancer-5, is-2)
det(dancer-5, the-3)
amod(dancer-5, best-4)
The copula be is not treated as the head of a clause, but rather the nonverbal predicate, as exemplified above.
Such an analysis is motivated by the fact that many languages often or always lack an overt copula in such constructions, as in the the following Russian and Hebrew examples:
Ivan lučšij tancor \n Ivan best dancer
nsubj(tancor, Ivan)
amod(tancor, lučšij)
ani Kim \n I am Kim
nsubj(Kim-2, ani-1)
In informal English, this may also arise.
Email usually free if you have Wifi.
nsubj(free, Email)
This analysis is adopted also when the predicate is a prepositional phrase, provided that the same copula (or absence thereof) is used here, in which case the nominal part of the prepositional phrase is the head of the clause.
Sue is in shape
nsubj(shape, Sue)
cop(shape, is)
case(shape, in)
If the copula is accompanied by other verbal auxiliaries for tense, aspect, etc., then they are also given a flat structure, and taken as dependents of the lexical predicate:
Sue has been helpful
nsubj(helpful, Sue)
cop(helpful, been)
aux(helpful, has)
The motivation for this choice is that this structure is parallel to the flat structure which we give to auxiliary verbs accompanying verbs. In particular, in languages such as English, it is often very difficult to decide whether to regard a participle as a verb or an adjective. Perhaps the following sentence is such a case:
The presence of troops will be destabilizing .
nsubj(destabilizing, presence)
cop/aux(destabilizing, be)
aux(destabilizing, will)
While a part of speech (and associated deprel: cop
vs. aux
) has to be decided in such cases, it would be unfortunate if the choice of part of speech also changed the dependency structure. Note, however, that the exact distribution of the copula construction is subject to language-specific variation.
Finally, the cop
may mark a predicate clause, i.e., a full clause serving as the predicate within an outer copular clause.
In such cases, nsubj:outer or csubj:outer can be used to distinguish the outer subject:
-ROOT- The problem is that this has never been tried .
nsubj:outer(tried, problem)
cop(tried, is)
mark(tried, that)
nsubj:pass(tried, this)
aux(tried, has)
advmod(tried, never)
aux:pass(tried, been)
root(-ROOT-, tried)
The important thing is to keep calm .
nsubj:outer(keep, thing)
cop(keep, is)
mark(keep, to)
xcomp(keep, calm)
csubj
: clausal subject
A clausal subject is a clausal syntactic subject of a clause, i.e., the subject is itself a clause. The governor of this relation might not always be a verb: when the verb is a copular verb, the root of the clause is the complement of the copular verb. The dependent is the main lexical verb or other predicate of the subject clause.
That he lied surprised me .
csubj(surprised, lied)
mark(lied, That)
nsubj(lied, he)
obj(surprised, me)
Whether he lied is unknown .
csubj(unknown, lied)
mark(lied, Whether)
nsubj(lied, he)
cop(unknown, is)
New from v2: The csubj
relation is also used for the clausal subject of a passive verb or verb group. For languages
that have a grammaticalized passive transformation, it is strongly recommended to use the subtype csubj:pass in
such cases. If the subject is of a copular clause whose predicate is itself a clause, csubj:outer may be used.
See also expletive subject examples under expl that use csubj
.
csubj:outer
: outer clause clausal subject
This relation specifies a clausal subject of a copular clause whose predicate is itself a clause, to signal that it is not the subject of the nested clause. See discussion of Predicate Clauses.
-ROOT- To hike in the mountains is to experience the best of nature .
root(-ROOT-, experience)
csubj:outer(experience, hike)
obl(hike, mountains)
mark(hike, To)
cop(experience, is)
mark(experience, to)
obj(experience, best)
For us to not attempt to solve the problem is for us to acknowledge defeat .
mark(attempt, For)
nsubj(attempt, us-2)
mark(attempt, to-3)
xcomp(attempt, solve)
csubj:outer(acknowledge, attempt)
cop(acknowledge, is)
mark(acknowledge, for)
nsubj(acknowledge, us-12)
obj(acknowledge, defeat)
The nominal counterpart of this relation is nsubj:outer.
The :outer
subtype is not intended for most clausal subjects of copular clauses—only those where the predicate is itself a clause.
Plain csubj (or another subtype) will be appropriate if the copular clause predicate is a nominal, adjective, etc.:
It is very important that your students respect you .
expl(important, It)
csubj(important, respect)
csubj:pass
: clausal passive subject
A clausal passive subject is a clausal syntactic subject of a passive clause.
That she lied was suspected by everyone
csubj:pass(suspected, lied)
Bylo mi doporučeno , abych to velmi dobře zvážil . \n It-has-been to-me recommended , that-I it very well weigh .
csubj:pass(doporučeno, zvážil)
csubj:pass(recommended, weigh)
Reflexive passive (the meaning is “You are not expected to come before nine o’clock.”)
Nepředpokládá se , že přijdete před devátou . \n It-does-not-expect itself , that you-will-come before nine .
csubj:pass(Nepředpokládá, přijdete)
csubj:pass(It-does-not-expect, you-will-come)
dep
: unspecified dependency
A dependency can be labeled as dep
when it is impossible to determine a more precise relation.
This may be because of a weird grammatical construction, or a limitation in conversion or parsing software.
The use of dep
should be avoided as much as possible.
my dad does nt really not that good
nmod(dad, my)
nsubj(does, dad)
advmod(does, nt)
advmod(does, really)
dep(does, good)
advmod(good, not)
advmod(good, that)
det
: determiner
The relation determiner (det
) holds between a nominal head and its
determiner. Most commonly, a word of POS DET
will have the relation det
and vice versa. The known exceptions at present are:
- In some of the datasets, a possessive determiner like [en] my is currently given the POS tag
DET
but the relation nmod, so that it is parallel with other possessive constructions. This is not yet completely parallel across languages; in some languages, it is much more clear than in English how possessive determiners relate to adjectives, and thenmod
relation is out of question.
The man is here
det(man, The)
Which book do you prefer ?
det(book, Which)
det:numgov
: pronominal quantifier governing the case of the noun
Pronominal quantifiers in Slavic languages are labeled det:numgov
instead of det
because they normally do not agree with the quantified noun in case
(unlike non-quantifying determiners).
The quantifier requires the counted noun to be in its genitive form. The whole phrase (quantifier + noun) is treated as a singular neuter noun phrase and it can fill roles where nominative, accusative or vocative noun phrases are expected.
To increase parallelism across languages (and also across morphological cases within one language),
the quantifier is not annotated as the head of the nominal. However, the det:numgov
label is used
to preserve the information about case conditions.
Czech:
Kolik mužů hrálo karty ? \n How-many men played cards ?
det:numgov(mužů, Kolik)
nsubj(hrálo, mužů)
obj(hrálo, karty)
punct(hrálo, ?-5)
det:numgov(men, How-many)
nsubj(played, men)
obj(played, cards)
punct(played, ?-11)
See also nummod:gov and det:nummod.
det:nummod
: pronominal quantifier agreeing in case with the noun
Pronominal quantifiers in Slavic languages are labeled det:nummod
or det:numgov instead of det
because they normally do not agree with the quantified noun in case
(unlike non-quantifying determiners).
They do agree only if the whole phrase (quantifier + noun) fills a role where
genitive, dative, locative or instrumental noun phrases are expected.
In these situations they are labeled det:nummod
.
Czech:
Nepamatuji si , s kolika muži jsem hrál karty . \n I-do-not-remember myself , with how-many men I-have played cards .
ccomp(Nepamatuji, hrál)
expl:pv(Nepamatuji, si)
punct(hrál, ,-3)
aux(hrál, jsem)
obj(hrál, karty)
iobj(hrál, muži)
case(muži, s)
det:nummod(muži, kolika)
punct(Nepamatuji, .-10)
ccomp(I-do-not-remember, played)
expl:pv(I-do-not-remember, myself)
punct(played, ,-14)
aux(played, I-have)
obj(played, cards)
iobj(played, men)
case(men, with)
det:nummod(men, how-many)
punct(I-do-not-remember, .-21)
See also nummod:gov and det:numgov.
det:poss
: possessive determiner
Whenever there is a possessive determiner, det:poss
should be used instead of det
. All possessive determiners have the feature Possessive
defined as Yes
and the only instances of the det:poss
relation attested in the Italian Treebank appear with those elements.
Sarà mia cura verificare .
det:poss(cura, mia)
Ha da poco annunciato le proprie dimissioni .
det:poss(dimissioni, proprie)
discourse
: discourse element
This is used for interjections and other discourse particles and
elements (which are not clearly linked to the structure of the
sentence, except in an expressive way). In English, for example, this includes:
interjections (oh, uh-huh, Welcome), fillers (um, ah),
and non-adverbial discourse markers (well, like, but not you know or actually).
We also use discourse
for list enumerators (e.g. 1., (a) marking an item in a sequence).
(Bullets, by contrast, are considered punctuation and attach as punct.)
These discourse elements are attached to the head of the most relevant nearby unit, often a clause.
I am 21 , well , will be in November .
nsubj(21, I)
cop(21, am)
parataxis(21, be)
discourse(be, well)
aux(be, will)
obl(be, November)
case(November, in)
punct(21, .)
punct(well, ,-4)
punct(well, ,-6)
Iguazu is in Argentina :)
discourse(Argentina-4, :)-5)
5/NUM . Cool for 10 minutes and serve .
discourse(Cool, 5)
punct(5, .-2)
To enter the stadium , you must not have — ( a ) a weapon ; ( b ) any food ; and ( c ) any drink .
discourse(weapon, a-12)
discourse(food, b)
discourse(drink, c)
conj(weapon, food)
conj(weapon, drink)
cc(drink, and)
dislocated
: dislocated elements
The dislocated
relation is used for fronted or postposed elements
that do not fulfill the usual core grammatical relations of a
sentence. These elements often appear to be in the periphery of the sentence, and may be separated off with a comma intonation.
It is used for fronted elements that introduce the topic of a sentence, as in the following Japanese and Greek examples. The dislocated element attaches to the head of the clause to which it belongs:
象 は 鼻 が 長い \n zoo wa hana ga naga-i \n elephant TOPIC nose SUBJ long-PRES
dislocated(長い-5, 象-1)
to jani ton kserume poli kala \n the John-Acc him know-1pl very well
dislocated(kserume, jani)
However, it would not be used for a topic-marked noun that is also the subject of the sentence; this would be an nsubj.
It is also used for postposed elements. The dislocated elements attach to the same governor as the dependent that they double for. Right dislocated elements are frequent in spoken languages. French and Greek examples follow.
Il faut pas la manger , la plasticine \n It must not it eat , the playdough
obj(manger, la-4)
dislocated(manger, plasticine)
obj(eat, it-13)
dislocated(eat, playdough)
ton kserume oli mas edho poli kala, to jani
dislocated(kserume, jani)
expl
: expletive
This relation captures expletive or pleonastic nominals. These are nominals that appear in an argument position of a predicate but which do not themselves satisfy any of the semantic roles of the predicate. The main predicate of the clause (the verb or predicate adjective or noun) is the governor. In English, this is the case for some uses of it and there: the existential there, and it when used in extraposition constructions. (Note that both it and there also have non-expletive uses.)
There is a ghost in the room
expl(is, There)
It is clear that we should decline .
expl(clear, It)
Some languages do not have expletives of the English sort, including most languages with free pro-drop (the ability to use zero anaphora rather than overt pronouns). In languages with expletives of this sort, they can be positioned where normally a core argument appears: the subject and direct object (and even indirect object) slots, as in the examples below. Note that in the analysis of these examples, we treat the postposed subject or clausal argument as a regular core argument, and mark the expletive with expl
.
There is a ghost in the room
expl(is, There)
nsubj(is, ghost)
obl(is, room)
I believe there to be a ghost in the room
nsubj(believe, I)
expl(believe, there)
xcomp(believe, be)
nsubj(be, ghost)
obl(be, room)
It is clear that we should decline .
expl(clear, It)
csubj(clear, decline)
That we should decline is clear .
csubj(clear, decline)
I mentioned it to Mary that Sue is leaving
nsubj(mentioned, I)
expl(mentioned, it)
obl(mentioned, Mary)
ccomp(mentioned, leaving)
A second, related, use of the expl
relation is for cases of true clitic doubling. For languages in which clitics and lexical nominals are usually in complementary distribution – languages, such as French, which obey “Kayne’s generalization” – then whichever of a clitic or a lexical nominal occurs will get the appropriate role, such as obj or iobj. In such languages, when doubling does occur, such as in spoken French, the right analysis is to regard the lexical nominal as dislocated (see the examples there). As such, the analysis will be the same as when a noun phrase doubles another noun phrase or a regular pronoun that fills a nominal argument position. However, other languages, such as Greek and Bulgarian, standardly allow doubling of a lexical nominal and a pronominal clitic, with the former still appearing in its regular role as an argument of the predicate. In these cases, if only one of the lexical nominal and the clitic appear in a clause, then whichever appears will be given the grammatical role of obj, iobj, etc. – parallel to the treatment of lexical nominals and pronouns in other languages, modulo the clitic pronoun having a different position in the sentence. However, if both occur, the lexical nominal will be given the grammatical role of obj, iobj, etc., and the clitic will be treated as a pronominal copy, which does not receive its own semantic role, and hence will get the role expl
. Modulo the different word order, this is fairly parallel to the treatment of it and there in English mentioned above, where another phrase satisfies the semantic role of the predicate. Examples from Greek and Bulgarian follow:
Της τον έδωσε της Καίτης τον αναπτήρα \n PRON.Fem.Gen PRON.Masc.Acc gave ART.Fem.Gen Keti.Gen ART.Masc.Acc lighter.Acc
expl(έδωσε, Της-1)
iobj(έδωσε, Καίτης)
det(Καίτης, της-4)
expl(έδωσε, τον-2)
obj(έδωσε, αναπτήρα)
det(αναπτήρα, τον-6)
Marija mu izprati pismo na rabotnika \n Maria 3.S.M.IO sent letter to the.worker
expl(izprati, mu)
obj(izprati, pismo)
iobj(izprati, rabotnika)
case(rabotnika, na)
Reflexives
The expletive relation is also used for reflexive pronouns (see the feature u-feat/Reflex) attached to inherently reflexive verbs, i.e. verbs that cannot occur without the reflexive pronoun and thus the pronoun does not play the role of a normal object (otherwise it would be possible to substitute it with an irreflexive pronoun or other nominal).
UD recognizes several functions of reflexive pronouns (clitics) that are usually distinguished with the help of subtypes
of the expl
relation (see also the report from the 2015 Uppsala discussion of clitics where
this approach was approved):
- expl:pv for reflexive clitics attached to inherently reflexive verbs (also called pronominal verbs in some grammars)
- expl:pass for reflexive clitics attached to transitive verbs and acting as a voice marker (passive or mediopassive)
- expl:impers for impersonal usage (works also with intransitive verbs)
A Czech example:
Martin se bojí zvířat . \n Martin REFLEX fears animals .
expl:pv(bojí, se)
expl:pv(fears, REFLEX)
Further general discussion of expletives can be found in Postal, P. M., and G. K. Pullum (1988) “Expletive Noun Phrases in Subcategorized Positions,” Linguistic Inquiry 19(4): 635–670. The status of clitic doubling, and arguments for the lexical nominal being an argument with the clitic a kind of pronominal copy, appear inter alia in Boris Harizanov (2014) Clitic doubling at the syntax-morphology interface: A-movement and morphological merger in Bulgarian. Natural Language and Linguistic Theory.
expl:impers
: impersonal expletive
The relation expl:impers
is a sub-class of expl, specific for the impersonal use of the reflexive clitic pronoun.
While the default function of a reflexive pronoun is to signal that the subject applies a transitive action to itself (i.e., the
reflexive pronoun is an object coreferential with the subject), the impersonal construction can be used with any verb, transitive
or intransitive. The clitic is formally identical with a reflexive object but it does not fill the object slot in these constructions
and if the verb is transitive, its real object still occurs in the clause and fills the slot. The reflexive clitic is not a subject
either; in fact there is no subject at all, which is the defining property of impersonal constructions. If the verb must express
subject agreement, it will take a default form (this depends on the language, a typical example would be 3rd person singular).
Impersonal constructions should be distinguished from reflexive passives, in which the reflexive clitic is attached as expl:pass. They are constructed for transitive verbs: The object is promoted to subject, the verb stays in its active form (although agreement morphemes may have to be adjusted to the new subject), and the object slot is filled by the reflexive pronoun.
[it] Si prevede che viaggerà. “He is expected to travel.”
Si prevede che viaggerà . \n REFL expects that will-travel .
expl:impers(prevede, Si)
expl:impers(expects, REFL)
punct(prevede, .-5)
punct(expects, .-11)
ccomp(prevede, viaggerà)
ccomp(expects, will-travel)
mark(viaggerà, che)
mark(will-travel, that)
In Italian, if there’s a clitic in a construction with a modal or an auxiliary verb, then generally it is an impersonal construction.
[it] Si può procedere a sequestro. “Seizure can be carried out.”
Si può procedere a sequestro . \n REFL can proceed to seizure .
expl:impers(procedere, Si)
expl:impers(proceed, REFL)
aux(procedere, può)
aux(proceed, can)
punct(procedere, .-6)
punct(proceed, .-13)
obl(procedere, sequestro)
obl(proceed, seizure)
case(sequestro, a)
case(seizure, to)
In the following Polish example, wystawę archeologiczną “archaeological exhibition” is in the accusative case, hence it is still the object and not a subject, hence it is a reflexive impersonal construction and not a reflexive passive.
[pl] Przygotowuje się również wystawę archeologiczną. “An archaeological exhibition is also being prepared.”
Przygotowuje się również wystawę archeologiczną . \n Prepares REFL also exhibition archaeological .
punct(Przygotowuje, .-6)
punct(Prepares, .-13)
expl:impers(Przygotowuje, się)
expl:impers(Prepares, REFL)
advmod(Przygotowuje, również)
advmod(Prepares, also)
obj(Przygotowuje, wystawę)
obj(Prepares, exhibition)
amod(wystawę, archeologiczną)
amod(exhibition, archaeological)
Compare the Polish example with Czech where the archaeological exhibition switched to the nominative, it became the subject and thus we are dealing with the reflexive passive construction instead.
[cs] Rovněž se připravuje archeologická výstava. “An archaeological exhibition is also being prepared.”
Rovněž se připravuje archeologická výstava . \n Also REFL prepares archaeological exhibition .
punct(připravuje, .-6)
punct(prepares, .-13)
expl:pass(připravuje, se)
expl:pass(prepares, REFL)
advmod(připravuje, Rovněž)
advmod(prepares, Also)
nsubj:pass(připravuje, výstava)
nsubj:pass(prepares, exhibition)
amod(výstava, archeologická)
amod(exhibition, archaeological)
expl:pass
: reflexive pronoun used in reflexive passive
Reflexive pronouns (see the feature cs-feat/Reflex) are used in various constructions in Czech,
including so-called reflexive passive.
In PDT, their relation to the verb is labeled AuxR
.
The corresponding label in Czech UD is called expl:pass
(since UD 2.0; in previous versions it was labeled auxpass:reflex
).
To se řekne snadno . \n It is said easily .
expl:pass(řekne, se)
expl:pass(said, is)
In the following Czech example, archeologická výstava “archaeological exhibition” has switched to the nominative, it became the subject and thus we are dealing with the reflexive passive and not with an impersonal construction (expl:impers).
[cs] Rovněž se připravuje archeologická výstava. “An archaeological exhibition is also being prepared.”
Rovněž se připravuje archeologická výstava . \n Also REFL prepares archaeological exhibition .
punct(připravuje, .-6)
punct(prepares, .-13)
expl:pass(připravuje, se)
expl:pass(prepares, REFL)
advmod(připravuje, Rovněž)
advmod(prepares, Also)
nsubj:pass(připravuje, výstava)
nsubj:pass(prepares, exhibition)
amod(výstava, archeologická)
amod(exhibition, archaeological)
Compare the Czech example with Polish where wystawę archeologiczną “archaeological exhibition” is in the accusative case, hence it is still the object and not a subject, hence it is a reflexive impersonal construction and not a reflexive passive.
[pl] Przygotowuje się również wystawę archeologiczną. “An archaeological exhibition is also being prepared.”
Przygotowuje się również wystawę archeologiczną . \n Prepares REFL also exhibition archaeological .
punct(Przygotowuje, .-6)
punct(Prepares, .-13)
expl:impers(Przygotowuje, się)
expl:impers(Prepares, REFL)
advmod(Przygotowuje, również)
advmod(Prepares, also)
obj(Przygotowuje, wystawę)
obj(Prepares, exhibition)
amod(wystawę, archeologiczną)
amod(exhibition, archaeological)
expl:pv
: reflexive clitic with an inherently reflexive verb
Reflexive pronouns (see the feature cs-feat/Reflex) usually replace objects of verbs. However, some verbs are inherently reflexive, i.e. the verb always occurs with a reflexive prounoun, and the pronoun cannot be replaced by a non-reflexive pronoun.
With these verbs, the reflexive pronoun is attached as expl:pv
instead of obj.
(Note that the expl
relation is first used for this purpose in the UD release 1.2,
and it is further subtyped as expl:pv
since UD 2.0,
to increase parallelism with other languages. In the previous releases this usage of
reflexive se/si was labeled compound:reflex
.)
Martin se bojí zvířat . \n Martin REFLEX fears animals .
expl:pv(bojí, se)
expl:pv(fears, REFLEX)
fixed
: fixed multiword expression
The fixed
relation is used for certain fixed grammaticized expressions. Such expressions tend to behave like function words. For example, in spite of is a fixed expression functioning as a preposition in English; bien que (‘although’, lit. ‘well that’) functions as a subordinating conjunction in French; and vare sig (‘either’, lit. ‘be itself’) functions as a (pre)conjunction in Swedish.
The scope of fixed
MWEs corresponds roughly to the fixed expressions category of Sag et al. and should not be used for multiword expressions that are morphosyntactically flexible.
Criteria
Fixed expressions typically do not allow intervening words, except in a few special cases such as clitics that go in a fixed position in the clause and can interrupt even fixed expressions. In addition, there may be inherently discontiguous fixed expressions, such as för … sedan in Swedish, corresponding to the English ago, which is syntactically irregular and always encloses a temporal expression, as in för 10 år sedan [“10 years ago”].
The creation of fixed multiword expressions is the end phase of a process of grammaticalization and there are always going to be cases of multiword expressions that are only somewhat grammaticalized. For practical treebanking, it is recommended to restrict this relation to the most grammaticalized cases and to treat them as a closed class by writing language-specific documentation listing the fixed expressions of the language.
Structure
Fixed MWEs are annotated in a flat structure, where all subsequent words in the expression
are attached to the first one using the fixed
label. The assumption is that these expressions
do not have any internal syntactic structure (except from a historical perspective) and that the
structural annotation is in principle arbitrary. In practice, however, it is highly desirable to use
a consistent annotation of all fixed MWEs in all languages.
Fixed MWEs should not have any internal modification. Therefore, if a word attaches as fixed
,
it should not have any dependents (except perhaps punct
, goeswith
, and reparandum
dependents,
as these are not true syntactic relations).
The ExtPos feature should be specified on the first word of the fixed expression to indicate the UPOS that the expression would have were it a single word. This indicates what external dependency relations the expression is compatible with.
I like dogs as/[ExtPos=CCONJ] well as cats
fixed(as-4, well-5)
fixed(as-4, as-6)
He cried because/[ExtPos=ADP] of you
fixed(because, of)
Bien/ADV[ExtPos=SCONJ] que/SCONJ malingre quand il était enfant, il devient néanmoins un athlète accompli et un grimpeur de talent. \n Although sickly when he was a child, he nevertheless became an accomplished athlete and a talented climber.
fixed(Bien, que)
New from v2: The fixed
relation replaces the old mwe
relation to prevent misunderstanding regarding its scope.
For v2.14, this page has been revised to more clearly articulate the relationship to multiword expressions.
For v2.15, the use of ExtPos has been added.
flat
: flat expression
The flat
relation is used to combine the elements of an expression where none of the immediate components can be identified as the sole head using standard substitution tests.
This includes both cases where more than one component passes the head test – as in the name John Smith, where either John or Smith can replace the whole in most contexts – and cases where no component does – as in San Francisco (in English).
Note also that the flat
relation is appropriate in such cases only when no more specific relation applies.
For example, in coordination structures annotated with the conj relation, any of the conjuncts can usually replace the whole.
Flat expressions are annotated with a flat structure, where all subsequent components in the expression are attached to the
first one using the flat
label. The assumption is that in these expressions, the flat
relations
are not syntactic head-modifier relations, and that the structural annotation is in principle arbitrary.
The components of a flat expression may have their own dependents, including nested flat structures.
For example, in the name Mary Jane Tyler Smith, both the first name (Mary Jane) and the last name
(Tyler Smith) are flat expressions, which are combined into a larger flat name (the tree appears below).
The prototypes for flat are: (i) personal names, (ii) foreign expressions, (iii) iconic sequences, and (iv) items separated for readability.
These are illustrated in the sections below.
The application of flat
may extend beyond these prototypes to, e.g., various kinds of name and number expressions.
However, even if an expression is idiosyncratic or follows a specialized pattern, every effort should be made to find a head rather than employing flat
.
If a head can be found but no substantive dependency relation is appropriate, dep can be used.
Note that what is considered to be transparent linguistic syntax (as opposed to flat structure) is subject to treebank-specific policies. (E.g., some treebanks might provide proper grammatical analyses in the presence of code-switching, or treat mathematical notation as following linguistic strategies like predication.)
Some languages opt to subcategorize usages of flat
via subtypes.
In particular, many treebanks use the flat:name
and flat:foreign
subtypes converted from the v1 relations name
and foreign
.
The examples on this page simply use plain flat
.
Names
A person’s name (or parts thereof) may lack the hallmarks of general constructions in the language, such that no single word can be identified as the head, in which case a flat structure applies.
Hillary Rodham Clinton
flat(Hillary, Rodham)
flat(Hillary, Clinton)
Nesting is possible:
Mary Jane Tyler Smith
flat(Mary, Jane)
flat(Tyler, Smith)
flat(Mary, Tyler)
On occasion, an expression with no clear head at the top level will have internal syntactic modifiers or punctuation:
Dwayne " The Rock " Johnson
flat(Dwayne, Rock)
flat(Dwayne, Johnson)
det(Rock, The)
punct(Rock, "-2)
punct(Rock, "-5)
The scope of flat
may extend beyond names of persons to names of other kinds of entities that depart from general headed structure.
The expressions under this category must be established by language-specific criteria.
The ExtPos feature may be used to signal the external syntactic distribution of the flat expression—e.g., ExtPos=PROPN
for 17 in:
17/NUM[ExtPos=PROPN] Across/ADV is wrong in this crossword .
flat(17, Across)
Flat vs. non-flat names
Names that have a regular syntactic structure, like The Lord of the Rings and Captured By Aliens, should be annotated with regular syntactic relations rather than flat structures:
The Lord of the Rings
det(Lord, The)
nmod(Lord, Rings)
case(Rings, of)
det(Rings, the)
The king of Sweden
det(king-2, The-1)
nmod(king-2, Sweden-4)
case(Sweden-4, of-3)
For organization names with clear syntactic modification structure, the dependencies should also reflect the syntactic modification structure using regular syntactic relations, as in:
Natural Resources Conservation Service
amod(Resources-2, Natural-1)
compound(Conservation-3, Resources-2)
compound(Service-4, Conservation-3)
In addition, regular syntactic relations are used: (i) for a modifying determiner or similar function word and (ii) to connect together the words of a description or name which involve embedded prepositional phrases, sentences, etc., when these relations are (i) recognized in the language being annotated (i.e., the analyses below are for French, German, and Spanish, not English) and (ii) deemed not to be grammaticalized to the extent that the original role of the function words has been lost.
Le Japon
det(Japon-2, Le-1)
Ludwig van Beethoven
case(Beethoven, van)
nmod(Ludwig, Beethoven)
Miguel de Cervantes y Saavedra
conj(Cervantes, Saavedra)
cc(Saavedra, y)
case(Cervantes, de)
nmod(Miguel, Cervantes)
Río de la Plata
case(Plata-4, de-2)
det(Plata-4, la-3)
nmod(Río-1, Plata-4)
A name may combine flat and non-flat structure. In a Portuguese text, the surname Paulo da Silva would be analyzed as follows:
Roberto Paulo da Silva
flat(Roberto, Paulo)
nmod(Paulo, Silva)
case(Silva, da)
The above analyses of Ludwig van Beethoven and Miguel de Cervantes y Saavedra assume that van resp. de are prepositions.
This is true in the languages of the names’ origin, but it can be expected to change when the name is used in foreign text
or when sufficient grammaticalization has taken place. For example,
when names like this are annotated in English, the appropriate analysis is as a flat
name:
Ludwig van Beethoven was a famous German composer .
flat(Ludwig, van)
flat(Ludwig, Beethoven)
det(composer, a)
amod(composer, famous)
amod(composer, German)
cop(composer, was)
nsubj(composer, Ludwig)
punct(composer, .)
Río de la Plata
flat(Río-1, de-2)
flat(Río-1, la-3)
flat(Río-1, Plata-4)
Al Arabiya is a Saudi-owned news organization
flat(Al-1, Arabiya-2)
nsubj(organization-7, Al-1)
And in Modern German or French, these prepositions have generally just become a fossilized part of a family name and regularly appear without the given name. Again, here, the flat analysis seems correct:
Von Hohenlohe gewann das Rennen . \n Von Hohenlohe won the race .
flat(Von-1, Hohenlohe-2)
nsubj(gewann-3, Von-1)
Foreign expressions
This encompasses expressions that may have been borrowed or quoted, but whose original grammatical structure is not necessarily accessible to speakers of the language(s) being annotated.
And then she went : gjiko frac zen .
parataxis(went, gjiko)
flat(gjiko, frac)
flat(gjiko, zen)
“Foreign” includes not just natural languages but also notational systems that are considered external to natural language proper and are governed by separate rules (e.g., musical chord progressions, software code excerpts).
The Vienna Game move order is 1. e4 e5 2. Nc3 .
nsubj(1., order)
cop(1., is)
flat(1., e4)
flat(1., e5)
flat(1., 2.)
flat(1., Nc3)
See further discussion at Foreign Expressions and Code-Switching.
History: UD v1 had a foreign
relation, but this is no longer part of the relation taxonomy and has been subsumed under flat
.
Iconic sequences
Sequences for which neither head-dependent nor coordination relationships apply include onomatopoeia (quack quack quack), “filler” words (do re mi), and gibberish (blargety blarg blarg).
The duck said quack quack quack
obj(said, quack-4)
flat(quack-4, quack-5)
flat(quack-4, quack-6)
Items separated for readability
Here the units separated by spaces or punctuation cannot really be construed as separate lexemes. A common case is telephone numbers:
Call 0118 999 881 999 119 725 3
obj(Call, 0118)
flat(0118, 999-3)
flat(0118, 881)
flat(0118, 999-5)
flat(0118, 119)
flat(0118, 725)
flat(0118, 3)
Filenames are another such case: they may contain spaces, and the components may or may not be recognizable as natural language strings, but in general filenames are not expected to follow regular syntactic structure. flat
signals filenames are a context where regular syntactic rules do not apply (whether the component tokens are analyzed morphologically like words of an art title, or simply tagged as X, or a mixture; the precise tokenization and morphological analysis is left to the discretion of treebanks). ExtPos=PROPN
may be specified in the MISC column to signal that the whole filename functions externally as a proper noun. For example, the filename Mydoc CHQ2 - Wednesday DRAFT (2).txt
might be analyzed as follows:
Mydoc/X[ExtPos=PROPN] CHQ2/X -/PUNCT Wednesday/PROPN DRAFT/PROPN (/PUNCT 2/NUM )/PUNCT .txt/X
flat(Mydoc, CHQ2)
flat(Mydoc, -)
flat(Mydoc, Wednesday)
flat(Mydoc, DRAFT)
flat(Mydoc, ()
flat(Mydoc, 2)
flat(Mydoc, ))
flat(Mydoc, .txt)
It is not expected that a language’s tokenization rules will make special exceptions for spaces in telephone numbers or filenames. That is, if spaces trigger token boundaries in general, they should also do so for telephone numbers and filenames; exceptional token-internal spaces will not be permitted.
Not all “unnecessary” spaces warrant flat
, however:
- improper spacing within a word should be addressed with goeswith
- numerals with thousands separator spaces (e.g. 1 000 000) may be treated as single words in languages where this convention is widespread
flat:foreign
: foreign words
Some treebanks use flat:foreign
to label sequences of foreign words. These are given
a linear analysis: the head is the first token in the foreign phrase.
flat:foreign
does not apply to loanwords or to foreign names.
It applies to quoted foreign text incorporated in a sentence/discourse
of the host language (unless we want to and know how to annotate the
internal structure according to the syntax of the foreign language).
Jarmusch se objevil ve Wangově snímku Modrá ve tváři ( Blue in the Face ) .
flat:foreign(Blue, in)
flat:foreign(Blue, the)
flat:foreign(Blue, Face)
See the general policy on Foreign Expressions and Code-Switching.
flat:name
: names
The flat:name
relation is a specialization of flat
used for names.
Ecco l'arringa di Tiziana Maiolo .
name(Tiziana, Maiolo)
Names are annotated in a flat, head-initial structure, in which all words in the name modify the first one using the flat:name
label. This also works for prepositions or determiners and numerals that are part of the names.
Formula 1/NUM .
flat:name(Formula, 1)
Marcello Dell' Utri .
flat:name(Marcello, Dell')
flat:name(Marcello, Utri)
Words joined by flat:name
should all be part of a minimal noun phrase; otherwise regular syntactic relations should be used. For organization names with clear syntactic modification structure, the dependencies should reflect the syntactic modification structure using regular syntactic relation.
L' ordine Mauriziano
det(ordine, L')
amod(ordine, Mauriziano)
Il Ministero di gli Interni
det(Ministero, Il)
nmod(Ministero, Interni)
det(Interni, gli)
case(Interni, di)
In addition, regular syntactic relations are used:
- for a modifying determiner or
- to connect together the words of a description or name which involve embedded prepositional phrases, sentences, etc.
Mariatersa Di Lascia
name(Mariatersa, Lascia)
case(Lascia, Di)
Università di Pristina
name(Università, Pristina)
case(Pristina, di)
goeswith
: goes with
This relation links two or more parts of a word that are separated in text that is not well edited.
These parts should be written together as one word according to the orthographic rules of a given language.
The head is always the first part, the other parts are attached to it with the goeswith
relation
(for consistency, similarly as in flat, fixed and conj).
The first part of the word is given the part of speech that the word would have been given if written together,
while the later parts of the word are given the POS X
. Similarly, only the first part can have a lemma
and morphological features. And while the annotation of morphological features is optional, if the treebank
does have features, then Typo=Yes
must be used with the goeswith
head.
Note also that only the last word part may be annotated with SpaceAfter=No
.
They come here with/ADP[Typo=Yes] out/X legal permission
goeswith(with-4, out-5)
never/ADV[Typo=Yes] the/X less/X[SpaceAfter=No] ,
goeswith(never, the)
goeswith(never, less)
For/VERB[Mood=Imp|Typo=Yes|VerbForm=Fin] get/X that !
goeswith(For, get)
obj(For, that)
punct(For, !)
iobj
: indirect object
WARNING | |
---|---|
⚠️ | The traditional term “indirect object”, associated with morphosyntactic encoding of certain types of arguments (especially datives/recipients) in a clause, has a wide range of interpretations across languages and linguistic frameworks. In UD, universal-level relations do not distinguish arguments and adjuncts; rather, the distinction is between core arguments and oblique modifiers. iobj must only be used for core arguments, never for obliques, as described below. The naming of this relation may be changed in the next major revision of the UD guidelines. |
In UD, the indirect object of a verb is any nominal phrase that is a core argument of the verb but is not its subject or (direct) object. The prototypical example is the recipient of ditransitive verbs of exchange:
She gave me a raise
iobj(gave, me)
nsubj(gave, She)
However, many languages allow other semantic roles as additional objects. The most common case is allowing benefactives, but some languages allow other roles. Examples include instruments, such as in the Kinyarwanda example below, or comitatives. At the other extreme, some languages lack all indirect objects.
Umukoóbwa a-ra-andik-iish-a íbárúwa íkárámu \n girl 1-PRS-write-APPL-ASP letter pen
obj(a-ra-andik-iish-a, íbárúwa)
iobj(a-ra-andik-iish-a, íkárámu)
In languages distinguishing morphological cases, the recipient will often be marked by the dative case.
However, the iobj
relation can be used only for a core argument. The morphological dative may signal a core argument
in some languages (such as Basque) but in many others it is just oblique (like the English preposition to). For
instance, in many Indo-European languages, the recipient should be attached as obl and not iobj
, regardless
of the traditional grammar which may label it as “indirect object”.
In the following Czech example, the verb takes two objects. Both are nouns in the accusative case, which is rather unusual—for most other verbs, one of the arguments would be in the dative and would thus be treated as oblique in UD. However, a bare accusative signals a core object and a verb with one nominative and two accusatives is ditransitive in UD. One of the accusatives is direct object (patient), the other is indirect (recipient). It is parallel to how the English translation would be annotated (where there is no morphological case marking) and also to verbs of giving in English (consider a similar sentence, he gave my daughter a class of maths).
On učí mou dceru matematiku . \n He teaches my daughter.Acc maths.Acc .
obj(učí, matematiku)
iobj(učí, dceru)
obj(teaches, maths.Acc)
iobj(teaches, daughter.Acc)
Predicates in Basque can cross-reference (by morphological agreement on the auxiliary verb) up to three arguments
in different morphological cases: ergative, absolutive, and dative. The morphological cross-reference is a strong
indicator that all three are core arguments. Therefore, if all three are present, we have a double-object situation
and the dative argument will be iobj
(while the ergative argument will be nsubj and the absolutive obj).
Even if the absolutive argument is omitted for a verb which licenses three arguments, the dative argument is still
iobj
.
(Nik)/Case=Erg (zuri)/Case=Dat liburua/Case=Abs eman dizut . \n (I) (you) book given I-have-you-it .
nsubj(eman, (Nik))
iobj(eman, (zuri))
obj(eman, liburua)
aux(eman, dizut)
punct(eman, .-6)
nsubj(given, (I))
iobj(given, (you))
obj(given, book)
aux(given, I-have-you-it)
punct(given, .-13)
Mariari/Case=Dat eman nion liburua/Case=Abs . \n To-Maria given I-have-her-it book .
iobj(eman, Mariari)
obj(eman, liburua)
aux(eman, nion)
punct(eman, .-5)
iobj(given, To-Maria)
obj(given, book)
aux(given, I-have-her-it)
punct(given, .-11)
Mariari/Case=Dat eman nion . \n To-Maria given I-have-her-it .
iobj(eman, Mariari)
aux(eman, nion)
punct(eman, .-4)
iobj(given, To-Maria)
aux(given, I-have-her-it)
punct(given, .-9)
Liburua/Case=Abs eman nion . \n Book given I-have-her-it .
obj(eman, Liburua)
aux(eman, nion)
punct(eman, .-4)
obj(given, Book)
aux(given, I-have-her-it)
punct(given, .-9)
Nevertheless, Basque has also a class of verbs that license only two core arguments, one ergative and
one dative. Here the ergative has the A function and the dative the P function (Zúñiga and Fernández 2014),
meaning that the dative is obj rather than iobj
, as in “The teacher has looked angrily at the students.”
Irakasleak/Case=Erg haserre begiratu die ikasleei/Case=Dat . \n Teacher angrily looked he-has-them to-students .
nsubj(begiratu, Irakasleak)
advmod(begiratu, haserre)
aux(begiratu, die)
obj(begiratu, ikasleei)
punct(begiratu, .-6)
nsubj(looked, Teacher)
advmod(looked, angrily)
aux(looked, he-has-them)
obj(looked, to-students)
punct(looked, .-13)
Another class of transitive verbs in Basque license one dative and one absolutive argument. Here the dative has the A function and the absolutive the P function, meaning that the dative is nsubj and the absolutive is obj, as in “The boy likes the soup very much.”
Zopa/Case=Abs izugarri gustatzen zaio mutilari/Case=Dat . \n Soup greatly pleasing it-is-him to-boy .
obj(gustatzen, Zopa)
advmod(gustatzen, izugarri)
aux(gustatzen, zaio)
nsubj(gustatzen, mutilari)
punct(gustatzen, .-6)
obj(pleasing, Soup)
advmod(pleasing, greatly)
aux(pleasing, it-is-him)
nsubj(pleasing, to-boy)
punct(pleasing, .-13)
In Tagalog, core arguments are marked by the prepositions ang and ng (or by corresponding inflection of personal pronouns), while oblique dependents are typically marked by the preposition sa (sometimes glossed as the dative). Giving somebody something is a (mono)transitive predicate.
- Nagbigay ang lalaki ng libro sa babae. “The man gave a book to the woman.” (agent voice)
# text = Nagbigay ang lalaki ng libro sa babae.
# text_en = The man gave a book to the woman.
1 Nagbigay bigay VERB _ Aspect=Perf|Mood=Ind|VerbForm=Fin|Voice=Act 0 root _ Gloss=gave
2 ang ang ADP _ Case=Nom 3 case _ Gloss=the
3 lalaki lalaki NOUN _ _ 1 nsubj _ Gloss=man
4 ng ng ADP _ Case=Gen 5 case _ _
5 libro libro NOUN _ _ 1 obj _ Gloss=book
6 sa sa ADP _ Case=Dat 7 case _ Gloss=DIR
7 babae babae NOUN _ _ 1 obl _ Gloss=woman|SpaceAfter=No
8 . . PUNCT _ _ 1 punct _ Gloss=.
However, locative dependents can be topicalized if the verb morphology signals the “locative voice”. Then the locative noun phrase switches to nominative, it becomes a core argument, while the original two core arguments keep core coding, too. Therefore we have a ditransitive clause with three core arguments, even for verbs that are not associated with ditransitives in other languages:
- Aalisan ng babae ng bigas ang sako para sa bata. “A/the woman will take some rice out of the sack for a/the child.” (locative voice)
# sent_id = 3.111c/tl
# text = Aalisan ng babae ng bigas ang sako para sa bata.
# gloss = FUT-take.out-DP ACT woman OBJ rice PIV sack BEN child
# text_en = A/the woman will take some rice out of the sack for a/the child.
# DP = directional pivot; PIV = pivot marker
1 Aalisan alis VERB _ Aspect=Prog|Mood=Ind|VerbForm=Fin|Voice=Lfoc 0 root _ Gloss=will-take-out|MSeg=a-alis-an|MGloss=FUT-take.out-DP
2 ng ng ADP _ Case=Gen 3 case _ _
3 babae babae NOUN _ _ 1 iobj:agent _ Gloss=woman
4 ng ng ADP _ Case=Gen 5 case _ _
5 bigas bigas NOUN _ _ 1 obj:patient _ Gloss=rice
6 ang ang ADP _ Case=Nom 7 case _ Gloss=the
7 sako sako NOUN _ _ 1 nsubj:loc _ Gloss=sack
8 para para ADP _ _ 10 case _ Gloss=for
9 sa sa ADP _ Case=Dat 10 case _ Gloss=BEN
10 bata bata NOUN _ _ 1 obl _ Gloss=child|SpaceAfter=No
11 . . PUNCT _ _ 1 punct _ Gloss=.
In Plains Cree (Wolvengrey 2011), transitive verbs cross-reference subjects and animate objects but not
inanimate objects. With a verb of giving, the theme is typically inanimate while the recipient is
typically animate. Assuming that nsubj and obj are reserved for the two core arguments
cross-referenced by the verb, the theme has to be iobj
(if it is a core argument at all; otherwise
it would have to be obl; but real oblique nominals in Plains Cree take a locative case affix,
which is not present here).
- Nikī-miyāw anima masinahikan. “I gave him that book.”
# text = Nikī-miyāw anima masinahikan.
# text_en = I gave him/her that book.
1 Nikī-miyāw miy VERB _ Animacy=Anim|Mood=Ind|Number[high]=Sing|Number[low]=Sing|Person[high]=1|Person[low]=3|Tense=Past|Voice=Dir 0 root _ Gloss=I-gave-him/her|MSeg=ni-kī-miy-ā-w|MGloss=1-PAST-give.to-DIR-3SG
2 anima anima DET _ Animacy=Inan|Number=Sing|PronType=Dem 3 det _ Gloss=that|MGloss=DEM.0's
3 masinahikan masinahikan NOUN _ Animacy=Inan|Number=Sing 1 iobj _ Gloss=book|SpaceAfter=No
4 . . PUNCT _ _ 1 punct _ Gloss=.
In the above example, the verb stem used is for animate objects, while masinahikan “book” is inanimate. That is a proof that the 3rd person singular cross-reference on the verb does not refer to the book but to an animate recipient that is not overtly represented in the sentence.
If the language has a prototypical iobj
(occurring in a double object construction with obj),
then morphosyntactic criteria need to be established for when a sole object is obj and when it is iobj
.1
Depending on the language, potential reasons to consider a sole object in a clause as an iobj
include:
- It has case marking distinct from that of a prototypical obj, e.g. dative rather than accusative
- Another, more patient-like object may be inserted into the clause without affecting the morphosyntax of the object in question
- The verb licenses the object in combination with a ccomp (the ccomp may be analyzed as taking the place of an obj)
For example, in English, the verb teach may occur with obj, iobj, or both:
She teaches the students introductory logic .
iobj(teaches, students)
obj(teaches, logic)
She teaches introductory logic .
obj(teaches, logic)
She teaches the first-year students .
iobj(teaches, students)
She teaches her students that good writing is important .
iobj(teaches, students)
ccomp(teaches, important)
She teaches her students to write well .
iobj(teaches, students)
xcomp(teaches, write)
However, not all verbs license two objects (or an object plus ccomp), in which case the sole object should be plain obj even if it has recipient-like semantics:
She questions her students about their interests .
obj(questions, students)
obl(questions, interests)
She helps her students to succeed .
obj(helps, students)
xcomp(helps, succeed)
References
- Fernando Zúñiga, Beatriz Fernández (draft 26.6.2014): Grammatical relations in Basque
- Arok Elessar Wolvengrey (2011): Semantic and pragmatic functions in Plains Cree syntax (PhD thesis). Utrecht: LOT. ISBN 978-94-6093-051-5.
-
This is an amended policy as described on the changes page. ↩
list
: list
The list
relation is used for chains of comparable items. In lists with more than two items, all
items of the list should modify the first one. If a list is something like a list of paragraphs
(for example, describing items in a catalogue), then each item will be one or more sentences and
no list relations appear, as we do not have relations between sentences.
However, informal and web text often contains passages which are meant to be interpreted as lists
but are parsed as single sentences. For example, email signatures often contain these structures,
in the form of contact information: the different contact information items are labeled as list
.
Steve Jones sj@abc.xyz University of Arizona
flat:name(Steve, Jones)
list(Steve, sj@abc.xyz)
list(Steve, University)
nmod(University, Arizona)
case(Arizona, of)
If the fields in the list are explicit and have a key-value structure, the key-value pair relations are labeled as appos.
Steve Jones Phone: 555-9814 Email: jones@abc.edf
flat:name(Steve-1, Jones-2)
list(Steve-1, Phone:-3)
list(Steve-1, Email:-5)
appos(Phone:-3, 555-9814-4)
appos(Email:-5, jones@abc.edf-6)
Another place where list
has been used is for a sequence of attributes or descriptive terms used as the title line of a review (such as product or restaurant reviews, etc.):
Long Lines , Silly Rules , Rude Staff , Ok Food
list(Lines, Rules)
list(Lines, Staff)
list(Lines, Food)
amod(Lines, Long)
amod(Rules, Silly)
amod(Staff, Rude)
amod(Food, Ok)
punct(Rules, ,-3)
punct(Staff, ,-6)
punct(Food, ,-9)
However, list
should not be over-used. If a construction can easily be analyzed using the grammatical relations of standard sentences, typically as a coordinated structure, then it should be analyzed with these more standard relations, even if it is laid out as a list typographically. In particular, when the list is written as a single sentence, with commas and overt coordination, then it should be analyzed as a coordinated structure.
For list item markers, see discourse.
mark
: marker
A marker is the word marking a clause as subordinate to
another clause. For a complement clause, this is words like [en] that
or whether. For an adverbial clause, the marker is typically a
subordinating conjunction like [en] while or although. The marker is a dependent of the
subordinate clause head. In a relative clause, it is a normally uninflected word, which simply introduces a relative clause, such as [he] še. (In this last use, one needs to distinguish between relative clause markers, which are mark
, from relative pronouns such as [en] who or that, which fill a regular verbal argument or modifier grammatical relation.)
Forces engaged in fighting after insurgents attacked
mark(attacked, after)
He says that you like to swim
mark(like, that)
Infinitive markers (e.g. English to, German zu) in infinitival clauses are also attached as mark
:
Er kam wieder , um das Werk zu Ende zu bringen \n He came again , so-that the work to end to bring
mark(bringen, um)
mark(bringen, zu-10)
mark(bring, so-that)
mark(bring, to-22)
nmod
: nominal modifier
The nmod
relation is used for nominal dependents of another noun or noun phrase and functionally corresponds to
an attribute, or genitive complement.
New from v2: The nmod
relation was previously used also for nominal dependents of verbs, adjectives, and adverbs. These are now covered by the new obl relation.
In conjunction with the case relation, nmod
provides a uniform analysis for the possessive alternation (with the option of a subtype like nmod:poss
to distinguish non-adpositional case):
the office of the Chair
det(office-2, the-1)
nmod(office-2, Chair-5)
case(Chair-5, of-3)
det(Chair-5, the-4)
the Chair 's office
det(Chair-2, the-1)
nmod:poss(office-4, Chair-2)
case(Chair-2, 's-3)
nmod:poss
: possessive nominal modifier
nmod:poss
is used for a possessive nominal modifier. In English, for example, it is marked with the genitive case
clitic ‘s or one of its variant forms.
Marie 's book
nmod:poss(book, Marie)
case(Marie, 's)
nmod:poss
must not be confused with the feature Poss=Yes
used for possessive pronouns.
nmod:poss
is only relevant for languages that have a particular construction, such as the possessive construction of English (also called Saxon genitive), that we want to distinguish from other nmod
constructions.
nmod:tmod
: temporal modifier
A temporal nominal modifier of another nominal is a subtype of the nmod relation:
if the modifier is specifying a time, it is labeled as tmod
.
Are you free for lunch some day this week ?
nmod:tmod(day, week)
nsubj
: nominal subject
A nominal subject (nsubj
) is a nominal which is the syntactic subject and the proto-agent of a clause.
That is, it is in the position that passes typical grammatical test for subjecthood, and this argument is the more agentive,
the do-er, or the proto-agent of the clause. This nominal may be headed by a noun,
or it may be a pronoun or relative pronoun or, in ellipsis contexts, other things such as an adjective.
New from v2: The nsubj
relation is also used for the nominal subject of a passive verb or verb group, even
though the subject is then not typically the proto-agent argument due to valency changing operations. For languages
that have a grammaticalized passive transformation, it is strongly recommended to use the subtype nsubj:pass in
such cases. If the subject is of a copular clause whose predicate is itself a clause, nsubj:outer may be used.
The governor of the nsubj
relation might not always be a verb: when
the verb is a copular verb, the root of the clause is the complement
of the copular verb, which can be an adjective or noun, including a noun marked by a preposition,
as in the examples below.
The nsubj
role is only applied to semantic arguments of a predicate.
When there is an empty argument in a grammatical subject position (sometimes called a pleonastic or expletive),
it is labeled as expl. If there is then a displaced subject
in the clause, as in the English existential there construction, it will be labeled as nsubj
.)
Clinton defeated Dole
nsubj(defeated, Clinton)
Dole was defeated by Clinton
nsubj:pass(defeated, Dole)
The car is red .
nsubj(red, car)
Sue is a true patriot .
nsubj(patriot, Sue)
We are in the barn .
nsubj(barn, We)
Agatha is in trouble .
nsubj(trouble, Agatha)
There is a ghost in the room .
expl(is, There)
nsubj(is, ghost)
These links present the many viewpoints that existed .
acl:relcl(viewpoints, existed)
nsubj(existed, that)
nsubj:outer
: outer clause nominal subject
This relation specifies a nominal subject of a copular clause whose predicate is itself a clause, to signal that it is not the subject of the nested clause. See discussion of Predicate Clauses.
-ROOT- The problem is that this has never been tried .
nsubj:outer(tried, problem)
cop(tried, is)
mark(tried, that)
nsubj:pass(tried, this)
aux(tried, has)
advmod(tried, never)
aux:pass(tried, been)
root(-ROOT-, tried)
The title is Some Like It Hot .
nsubj:outer(Like, title)
cop(Like, is)
nsubj(Like, Some)
obj(Like, It)
xcomp(Like, Hot)
There may be an outer subject with no inner subject:
The important thing is to keep calm .
nsubj:outer(keep, thing)
cop(keep, is)
mark(keep, to)
xcomp(keep, calm)
The clausal counterpart of this relation is csubj:outer.
Only subjects are required to be distinguished in this way. There may, for example, be inner and outer copulas, both attaching as cop:
The important thing is to be calm .
nsubj:outer(calm, thing)
cop(calm, is)
mark(calm, to)
cop(calm, be)
The :outer
subtype is not intended for most nominal subjects of copular clauses—only those where the predicate is itself a clause.
Plain nsubj (or another subtype) will be appropriate if the copular clause predicate is a nominal, adjective, etc.:
That book is very good .
nsubj(good, book)
The title is Green Eggs and Ham .
nsubj(Eggs, title)
nsubj:pass
: passive nominal subject
A passive nominal subject is a noun phrase which is the syntactic subject of a passive clause.
Schwarzenberg byl poražen Zemanem . \n Schwarzenberg was defeated by-Zeman .
nsubj:pass(poražen, Schwarzenberg-1)
nsubj:pass(defeated, Schwarzenberg-7)
Reflexive passive (the meaning is “This will be solved tomorrow.”)
Tohle se bude řešit zítra . \n This itself will solve tomorrow .
nsubj:pass(řešit, Tohle)
nsubj:pass(solve, This)
nummod
: numeric modifier
A numeric modifier of a noun is any number phrase that serves to modify the meaning of the noun with a quantity.
Sam ate 3 sheep
nummod(sheep, 3)
Sam spent forty dollars
nummod(dollars, forty)
Sam spent $ 40
nummod($, 40)
Note that indefinite quantifiers such as few, many are tagged
u-pos/DET rather than u-pos/NUM.
Therefore their relation to the quantified noun is not nummod
but
det:
Sam ate many sheep
det(sheep, many)
Furthermore, a number that serves as a label for an entity rather than denoting quantity
is not nummod
. For example, in The meeting will be in room 4, the number is the name
of a particular room, it is different from the expression 4 rooms. Note that the label
of the room could also be non-numeric, as in The meeting will be in room A. UD analyzes
the number as a nominal (even if keeping the UPOS tag NUM for it). Hence the number
is attached as nmod to the noun it modifies, unless there is clear morphosyntactic
evidence in the language for the opposite direction. See also §3.6.3 of
de Marneffe et al. (2021).
The meeting will be in room 4
det(meeting, The)
nsubj(room, meeting)
aux(room, will)
cop(room, be)
case(room, in)
nmod(room, 4)
nummod:gov
: numeric modifier governing the case of the noun
nummod:gov
differs from nummod
in that the numeral requires the counted noun to be in its genitive form.
The whole phrase (numeral + noun) is treated as a singular neuter noun phrase
and it can fill roles where nominative, accusative or vocative noun phrases are expected.
This construction occurs in many Slavic languages.
To increase parallelism across languages (and also across morphological cases within one language),
the numeral is not annotated as the head of the nominal. However, the nummod:gov
label is used
to preserve the information about case conditions.
Czech:
Pět mužů hrálo karty . \n Five men played cards .
nummod:gov(mužů, Pět)
nsubj(hrálo, mužů)
obj(hrálo, karty)
punct(hrálo, .-5)
nummod:gov(men, Five)
nsubj(played, men)
obj(played, cards)
punct(played, .-11)
See also det:numgov and det:nummod.
obj
: object
The object of a verb is the second most core argument of a verb after the subject. Typically, it is the noun phrase that denotes the entity acted upon or which undergoes a change of state or motion (the proto-patient).
She gave me a raise
obj(gave, raise)
In languages distinguishing morphological cases, the
object will often be marked by the accusative case. If a verb dictates
another case (dative, genitive…), the fundamental question is whether
such cases qualify as core in the given language. Often these cases
are oblique, regardless of the presence or absence of an adposition.
Consequently they cannot use the obj
relation and must use obl,
even if the traditional grammar calls such dependents “objects”.
If there are two or more objects, one of them should be obj
and the
others should be iobj. In such cases it is necessary to decide what
is the most directly affected object (patient).
If there is just one object, it should likely be obj
unless it is morphosyntactically more similar to clear cases
of iobj in the language than it is to prototypical patient arguments.
There is further discussion of the two kinds of object at iobj. If possible, language-specific documentation should be available to help identify the primary (or direct) object.
obl
: oblique nominal
The obl
relation is used for a nominal (noun, pronoun, noun phrase) functioning as a non-core (oblique) argument or
adjunct. This means that it functionally corresponds to an adverbial attaching to a verb, adjective or other adverb.
The obl
relation can be further specified by the case. In conjunction with the case relation, it provides a uniform
analysis for:
- variant forms with case, a preposition or a postposition, as in Finnish for example:
etsiä ilman johtolankaa \n to_search without clue.PARTITIVE
obl(etsiä, johtolankaa)
case(johtolankaa, ilman)
etsiä taskulampun kanssa \n to_search torch.GENITIVE with
obl(etsiä, taskulampun)
case(taskulampun, kanssa)
etsiä johtolangatta \n to_search clue.ABESSIVE
obl(etsiä, johtolangatta)
- the dative alternation where the prepositional construction gets a similar analysis to the double object construction:
give the children the toys
obj(give, toys)
iobj(give, children)
give the toys to the children
obj(give, toys)
obl(give, children)
case(children, to)
# give the toys to the children
1 donner donner VERB _ VerbForm=Inf 0 root _ give
2 les le DET _ Definite=Def|Number=Plur 3 det _ the
3 jouets jouet NOUN _ Gender=Masc|Number=Plur 1 obj _ toys
4-5 aux _ _ _ _ _ _ _ _
4 à à ADP _ _ 6 case _ to
5 les le DET _ Definite=Def|Number=Plur 6 det _ the
6 enfants enfant NOUN _ Gender=Masc|Number=Plur 1 obl _ children
obl
is also used for temporal and locational nominal modifiers:
Last night , I swam in the pool
obl(swam, night)
obl(swam, pool)
and for the agent of a passive verb (with the optional subtype obl:agent):
the cat was chased by the dog
nsubj:pass(chased, cat)
obl:agent(chased, dog)
obl:agent
: agent modifier
The relation obl:agent
is used for agents of passive verbs.
In Czech, the agent is a nominal in the instrumental Case.
Cena byla udělena děkanem fakulty . \n Prize was awarded by-dean of-faculty .
obl:agent(udělena, děkanem)
obl:agent(awarded, by-dean)
Typical agents are animate but it is not a rule.
Inanimate agents may be sometimes difficult to distinguish from instruments,
which are also coded by the instrumental case.
Instruments are attached using the simple relation obl
.
Consider the following two examples, the first one is active and the second is passive.
Praštil psa klackem . \n He-hit dog with-a-stick .
obl(Praštil, klackem)
obl(He-hit, with-a-stick)
Pes byl praštěn klackem . \n Dog was hit with-a-stick .
obl(praštěn, klackem)
obl(hit, with-a-stick)
However, in passive sentences like Byl přejet autem “He was run over by a car,” the car could be analyzed as an inanimate agent, but also as an instrument, which is supported by the plausibility of the active counterpart, Přejeli ho autem “They ran over him with a car.”
obl:arg
: oblique argument
The relation obl:arg
is used for oblique arguments and distinguishes them from
adjuncts, which use the plain obl relation. It is thus possible to preserve
the notion of object as it is defined in the traditional grammar of some
languages, where it essentially follows the distinction between arguments and
adjuncts (which is otherwise not reflected in the main UD relation types — see the
discussion here).
A Czech example:
Spoléhám se na jeho instinkt . \n I-rely REFL on his instinct .
obl:arg(Spoléhám, instinkt)
obl:arg(I-rely, instinct)
case(instinkt, na)
case(instinct, on)
Arguments are selected by the predicate. Their coding (preposition and morphological case) is determined by the predicate; within the set of arguments of this predicate, the coding maps the argument to a particular semantic role. In contrast, the semantics of an adjunct is relatively independent of the predicate, and typical adjuncts (such as specifications of time, location, manner or instrument) can combine with a large number of different predicates.
Hence in the above example, the preposition na “on” and the accusative case of the noun instinkt “instinct” are selected by the verb spoléhat “to rely”. Other verbs may also select the same preposition and case but the meaning will be different: for instance, myslet na někoho “to think of someone.” Finally, the preposition na itself has an adessive or allative meaning (see the corresponding values of the Case feature). This meaning is suppressed when the preposition is selected by a predicate but it is more recognizable in adjuncts. In the following example, the preposition combines with a noun phrase in the locative case and marks a locational modifier:
Konference se koná na Slovensku . \n Conference REFL takes-place in Slovakia .
obl(koná, Slovensku)
obl(takes-place, Slovakia)
case(Slovensku, na)
case(Slovakia, in)
obl:lmod
: locative modifier
A locative modifier is a subtype of the obl relation: if the modifier is specifying a location, it is labeled as lmod
.
Danish: Drive the road you are told.
Kør den vej , du får besked på . \n Drive the road , you get order to .
obl:lmod(Kør, vej)
obl:tmod
: temporal modifier
A temporal modifier is a subtype of the obl relation: if the modifier is specifying a time, it is labeled as tmod
.
Last night , I swam in the pool
obl:tmod(swam, night)
You need to turn in your homework by next week
obl:tmod(turn, week)
orphan
: orphan
The ‘orphan’ relation is used in cases of head ellipsis where simple promotion would result in an unnatural and misleading dependency relation. The typical case is predicate ellipsis where one of the core arguments has to be promoted to clausal head.
Marie won gold and Peter bronze
nsubj(won, Marie)
obj(won, gold)
conj(won, Peter)
cc(Peter, and)
orphan(Peter, bronze)
In this example, the subject Peter is promoted to the head position in the second conjunct. Attaching
the object bronze to the subject is necessary to preserve the integrity of the clause, but using the
standard relation obj would be misleading because bronze is not the object of Peter. Therefore,
the orphan
relation is used to indicate that this is a non-standard attachment. By contrast, the coordinating
conjunction and performs essentially the same function as in the non-elliptical case and therefore retains
its normal relation cc
.
See further discussion of ellipsis.
parataxis
: parataxis
The parataxis relation (from Greek for “place side by side”) is a relation between a word (often the main predicate of a sentence) and other elements, such as a sentential parenthetical or a clause after a “:” or a “;”, placed side by side without any explicit coordination, subordination, or argument relation with the head word. Parataxis is a discourse-like equivalent of coordination, and so usually obeys an iconic ordering. Hence it is normal for the first part of a sentence to be the head and the second part to be the parataxis dependent, regardless of the headedness properties of the language. But things do get more complicated, such as cases of parentheticals, which appear medially.
Let 's face it we 're annoyed
parataxis(Let, annoyed)
The guy , John said , left early in the morning
parataxis(left, said)
punct(said, ,-3)
punct(said, ,-6)
An inventory of constructions to which parataxis has been applied
The following material is duplicated in the syntax overview.
Side-by-side sentences (“run-on sentences”)
The parataxis relation is used for a pair of what could have been standalone sentences, but which are being treated together as a single sentence. This may happen because sentence segmentation of the sentence was done primarily following the presence of sentence-final punctuation, and these clauses are joined by punctuation such as a colon or comma, or not delimited by punctuation at all. In a spoken corpus, it may happen because what is labeled as a sentence is more commonly an utterance turn. Even if the treebanker is doing the sentence division, it may happen because there seems to be a clear discourse relation linking two clauses. Sometimes there are more than two sentences joined in this way. In this case we make all the later sentences dependents of the first one, to maximize similarity to the analysis used for conjunction.
Bearded dragons are sight hunters , they need to see the food to move .
parataxis(hunters, need)
punct(need, ,)
This relation may happen with units that are smaller than sentences:
Divided world the CIA
amod(world, Divided)
parataxis(world, CIA)
det(CIA, the)
Paired clauses with non-conjunction connective (“X so Y” etc.)
The relation is also used for clauses connected by a word like so, then, therefore, or however if neither clause is interpreted as modifying the other, and there is no coordinating conjunction:
He claimed to be a wizard ; however/ADV , he turned out to be a humbug .
parataxis(claimed, turned)
advmod(turned, however)
I 'm hungry , so/ADV I 'm getting a bagel .
parataxis(hungry, getting)
advmod(getting, so)
The following, by contrast, are advcl modifiers:
Eat now so/ADV you wo n't be hungry later .
advcl(Eat, hungry)
advmod(hungry, so)
If/SCONJ you build it , then/ADV they will come .
advcl(come, build)
mark(build, If)
advmod(come, then)
Note that if-clauses should almost always be analyzed as subordinate, even when then is present.
Reported speech
When a speech verb interrupts reported speech content, the interruption is treated as a parenthetical parataxis:
The guy , John said , left early in the morning
parataxis(left, said)
punct(said, ,-3)
punct(said, ,-6)
See further discussion of reported speech at ccomp.
News article bylines
We have used the parataxis relation to connect the parts of a news article byline. There does not seem to be a better relation to use.
Washington ( CNN ) :
parataxis(Washington, CNN)
punct(CNN, ()
punct(CNN, ))
punct(CNN, :)
Interjected clauses
Single word or phrase interjections are analyzed as discourse, but when a whole clause is interjected, we use the relation parataxis.
Calafia has great fries ( they are to die for ! )
parataxis(has, are)
punct(are, ()
punct(are, ))
Just to let you all know Matt has confirmed the booking for 3rd Dec is OK .
parataxis(confirmed, let)
In the second example, we treat the second half as the head of the dependency because the first half feels like a whole clause interjection, not like the main clause of the utterance.
Tag questions
We also use the parataxis relation for tag questions such as isn’t it? or haven’t you?.
It 's not me , is it ?
parataxis(me, is)
punct(is, ,)
punct
: punctuation
This is used for any piece of punctuation in a clause, if punctuation
is being retained in the typed dependencies. Note that symbols tagged SYM
are not punctuation and cannot be attached via the punct
relation.
Go home !
punct(Go, !)
Tokens with the relation u-dep/punct always attach to content words (except in cases of ellipsis) and can never have dependents.
Since punct
is not a normal dependency relation, the usual criteria for determining the head word do not apply.
Instead, we use the following principles:
- A punctuation mark separating coordinated units is attached to the following conjunct.
- A punctuation mark preceding or following a dependent unit is attached to that unit.
- Within the relevant unit, a punctuation mark is attached at the highest possible node that preserves projectivity.
- Paired punctuation marks (e.g. quotes and brackets, sometimes also dashes, commas and other) should be attached to the same word unless that would create non-projectivity. This word is usually the head of the phrase enclosed in the paired punctuation.
See also examples at parataxis.
reparandum
: overridden disfluency
We use reparandum
to indicate disfluencies overridden in a speech
repair. The disfluency is the dependent of the repair.
Go to the righ- to the left .
obl(Go-1, left-7)
reparandum(left-7, righ-)
case(righ-, to-2)
det(righ-, the-3)
case(left-7, to-5)
det(left-7, the-6)
root
: root
The root
grammatical relation points to the root of the sentence. A fake node ROOT
is used as the governor. The ROOT
node is indexed with 0, since the indexing of real words in the sentence starts at 1. (The ROOT
node is not represented
explicitly in CoNLL-U.)
ROOT I love French fries .
root(ROOT, love)
New from v2: There should be just one node with the root
dependency relation in every tree.
If the main predicate is not present (due to ellipsis) and there are multiple orphaned dependents,
one of these is promoted to the head (root) position and the other orphans are attached to it.
(This rule has in practice been followed since release v1.2 but was not explicitly stated in the
original v1 guidelines.)
ROOT And Robert the fourth place .
root(ROOT, Robert)
cc(Robert, And)
orphan(Robert, place)
punct(Robert, .)
amod(place, fourth)
det(place, the)
vocative
: vocative
The vocative relation is used to mark a dialogue participant addressed in a text (common in conversations, dialogue, emails, newsgroup postings, etc.). The relation links the addressee’s name to its host sentence. A vocative commonly co-occurs with a null subject, as in the first example below. If the nominal is clearly vocative in intent, the preference is to use the vocative relation.
Guys , take it easy!
vocative(take, Guys)
Marie , comment vas - tu ?
vocative(vas, Marie)
xcomp
: open clausal complement
An open clausal complement (xcomp
) of a verb or an adjective is (i) a core argument of the verb, (ii) which is without its own subject and (iii) for which the
reference of the subject is necessarily determined by an argument
external to the xcomp
. The third requirement is often referred to as obligatory control.
An xcomp
can also be described as a predicative complement. The subject of the xcomp
is normally, but not always, controlled by the object of the next higher
clause, if there is one, or else by the subject of the next higher
clause.
These clauses tend to be non-finite in many languages,
but they can be finite as well. The name xcomp
is
borrowed from Lexical-Functional Grammar (see Joan Bresnan, 2001, Lexical-Functional Syntax, chapter on “Predication Relations”).
We expect them to change their minds
xcomp(expect, change)
obj(expect, them)
Sue asked George to respond to her offer
xcomp(asked, respond)
iobj(asked, George)
I started to work there yesterday
xcomp(started, work)
You look great
xcomp(look, great)
I consider him a fool
obj(consider, him)
xcomp(consider, fool)
Louise struck me as a fool
obj(struck, me)
case(fool, as)
xcomp(struck, fool)
I consider her honest
obj(consider, her)
xcomp(consider, honest)
I regard her as honest
obj(regard, her)
mark(honest, as)
xcomp(regard, honest)
We got COVID-19 under control
obj(got, COVID-19)
case(control, under)
xcomp(got, control)
Susan is liable to be arrested
cop(liable, is)
xcomp(liable, arrested)
The predicative complement can be headed by various parts of speech, including a VERB, ADJ, or NOUN. A nominal predicative complement can be marked by a preposition (in English, often but not always by as). The xcomp
-taking predicate of the higher clause can be a VERB or ADJ.
Contrast xcomp
with other complement clauses where there is an overt subject or no obligatory control, which use ccomp:
He says that you like to swim
ccomp(says, like)
I suggest eating now before the food gets cold
ccomp(suggest, eating)
The Inherited Subject Criterion
In examples like “I consider her honest”, the UD analysis corresponds to traditional grammar and what was termed “raising to object” in early generative grammar: the nominal “her” in these constructions is treated as the object of the higher clause (as its accusative morphology and ability to passivize suggests).
Note that the above condition “without its own subject” does not mean that a
clause is an xcomp
just because its subject is not overt. The subject must be necessarily inherited from a fixed position in the higher clause. That is, there should be no available interpretation where the subject of the lower clause may be distinct
from the specified role of the upper clause. In cases where the missing subject may or must be distinct from a fixed role in the higher clause, ccomp
should be used instead, as below. This includes cases of arbitrary subjects and anaphoric control. In the following example, the subject of start or starting does not have to be the boss, it is any contextually relevant person or group of people. In addition, in these cases, the complement clause can often be replaced by a pronoun like it or that and it can sometimes be passivized (Starting the project was recommended by the boss).
The boss said to start the project
ccomp(said, start)
The boss recommended starting the project
ccomp(recommended, starting)
Pro-drop languages have clauses where the subject is not present as a separate word,
yet it is inherently present (and often deducible from the form of the verb).
The relation between clauses with pro-drop may or may not be xcomp
.
The implicit subjects of a subordinate clause and a higher clause may be coincidentally coreferent, warranting ccomp or advcl:
Píšu , protože jsem to slíbil . \n I-write , because I-have it promised .
advcl(Píšu, slíbil)
advcl(I-write, promised)
aux(slíbil, jsem)
aux(promised, I-have)
obj(slíbil, to)
obj(promised, it)
mark(slíbil, protože)
mark(promised, because)
Slíbil jsem , že budu psát . \n Promised I-have , that I-will write .
ccomp(Slíbil, psát)
ccomp(Promised, write)
aux(Slíbil, jsem)
aux(Promised, I-have)
aux(psát, budu)
aux(write, I-will)
mark(psát, že)
mark(write, that)
It is only xcomp
if the implicit subject depends on an argument from a higher clause (one cannot be varied without the other):
Slíbil jsem psát . \n Promised I-have to-write .
xcomp(Slíbil, psát)
xcomp(Promised, to-write)
aux(Slíbil, jsem)
aux(Promised, I-have)
Secondary Predicates
The following is excerpted from a more detailed discussion of secondary predicates.
The xcomp
relation is also used in constructions that are known as secondary predicates or predicatives.
Examples:
- She declared the cake beautiful.
- She declared the cake a success.
We could paraphrase the sentence using a subordinate clause: She declared that the cake was beautiful.
There are two predicates mixed in one clause: 1. she declared something, and 2. the cake was beautiful (according to her opinion).
The secondary predicate will be attached to the main predicate as an xcomp
:
She declared the cake beautiful .
nsubj(declared, She)
obj(declared, cake)
xcomp(declared, beautiful)
The subject of “declared” is again obligatorily controlled by a role in the higher clause. In the enhanced representation, there is an additional subject link showing the secondary predication:
She declared the cake beautiful .
nsubj(declared, She)
obj(declared, cake)
xcomp(declared, beautiful)
nsubj(beautiful, cake)
A Czech example:
jmenovat někoho generálem \n to-appoint someone as-a-general
obj(jmenovat, někoho)
xcomp(jmenovat, generálem)
Remember that xcomp
is used for core arguments of predicates
so it will not be used for non-core instances of secondary predication.
For instance, in She entered the room sad we also have a double predication
(she entered the room; she was sad).
But sad is not a core argument of enter: leaving it out will neither affect grammaticality
nor significantly alter the meaning of the verb.
On the other hand, leaving out beautiful in she declared the cake beautiful
will either render the sentence ungrammatical or lead to a different interpretation of declared.
The result is that in She entered the room sad, sad is considered a modifier (not complement) of the verb,
with the relation advcl instead of xcomp
.
(This was changed from the previous approach which analyzed the secondary predication directly with acl,
because the nominal predicand is not always overt, and even when it is, the adjective does not really belong to the same nominal phrase.)
She entered the room sad .
nsubj(entered, She)
det(room, the)
obj(entered, room)
advcl(entered, sad)
punct(entered, .)
Entering the room sad is not recommended .
csubj(recommended, Entering)
det(room, the)
obj(Entering, room)
advcl(Entering, sad)
cop(recommended, is)
advmod(recommended, not)
punct(recommended, .)
Notice that while can be inserted before sad, clearly marking it as a clause.
A Czech example:
Vstoupila do místnosti smutná . \n She-entered to room sad .
advcl(Vstoupila, smutná)
advcl(She-entered, sad)
There is no need to decide whether an example like the following is a depictive or a manner adverbial:
Linda found the money walking our dog .
nsubj(found, Linda)
det(money, the)
obj(found, money)
advcl(found, walking)
det(dog, our)
obj(walking, dog)
punct(found, .)
The optional secondary predication or controlled adjunct subject relation can be represented with an enhanced dependency edge in addition to the advcl relation.
Some other cases that could be regarded as secondary predicates are just treated as obliques. In particular, locative arguments of verbs are always treated as obliques:
She put a book on the table .
nsubj(put, She)
det(book, a)
obj(put, book)
case(table, on)
det(table, the)
obl(put, table)
punct(put, .)