home v2/v2 edit page issue tracker

This page pertains to UD version 2.

Nonverbal Predication and Copulas in UD v2

The treatment of copula constructions (non-verbal intransitive predication) is quite diverse in the current version of the treebanks (see table below for the status quo). In order to provide more concrete guidelines and to achieve better consistency cross-lingually and within a single language, we propose the following changes in v2:

Problems with the current copula analysis

The main problem is the lack of standardisation. Leaving aside the Galician example, which appears to be a conversion error, the Spanish treebank has over 229 verbs with the cop relation, where the Swedish treebank has one.

Éste quedó sorprendido . \n He was/stayed surprised
cop(sorprendido, quedó)
nsubj(sorprendido, Éste)
Han blev överraskad . \n He was/became surprised
nsubj(blev, Han)
xcomp(blev, överraskad)

Treebanks differ in if they treat the PP/case-marked nominal as head, in Swedish it is head, while in Finnish it is dependent:

Hon är i huset
nsubj(huset, Hon)
cop(huset, är)
Se on talossa
nsubj(on, Se)
nmod(on, talossa)

There is also variation within a language, for example, the existential construction with copula in English:

There is a book on the table .
expl(is, There)
nsubj(is, book)

Compared to the bare copula:

A book is on the table .
nsubj(table, book)
cop(table, is)

We also do not provide a consistent analysis when one side of the copula is a clause:

The important thing is to keep calm
nsubj(is, thing)
ccomp(is, keep)

Guidelines for UDv2

In order to achieve a more consistent treatment of nonverbal predication in v2, we first define six categories of nonverbal predication that can be found cross-linguistically (with or without a copula):

  1. Equation (aka identification): “she is my mother”
  2. Attribution: “she is nice”
  3. Location: “she is in the bathroom”
  4. Possession: “the book is hers”
  5. Benefaction: “the book is for her”
  6. Existence: “there is food (in the kitchen)”

We then give the following guidelines for the analysis of these constructions:

Language-specific examples

We now exemplify how these guidelines apply to different languages.

English

The English analysis more or less follows the analysis in the current UD_English treebank. A non-head copula is used in categories (1-5), except for equated clauses, but (6) is treated differently because the verb must be the head in pure existentials.

(1a)

she is my mother
nsubj(mother, she)
cop(mother, is)

(1b)

the fact is that she is my mother
nsubj(is-3, fact)
ccomp(is-3, mother)
nsubj(mother, she)
cop(mother, is-6)

(2)

she is nice
nsubj(nice, she)
cop(nice, is)

(3)

she is in the kitchen
nsubj(kitchen, she)
cop(kitchen, is)
case(kitchen, in)

(4)

it is hers
nsubj(hers, it)
cop(hers, is)

(5)

it is for her
nsubj(her, it)
cop(her, is)
case(her, for)

(6a)

there is food
expl(is, there)
nsubj(is, food)

(6b)

there is food in the kitchen
expl(is, there)
nsubj(is, food)
obl(is, kitchen)
case(kitchen, in)

Irish

Irish uses a copula verb in categories 1, 4 and 5, and a different verb in categories 2, 3 and 6. Not only the verb but also the word order is different.

(1)

Is ise mo mháthair \n is she my mother
nsubj(ise, mháthair)
cop(ise, Is)

(2)

tá sí deas \n is she nice
nsubj(tá, sí)
xcomp(tá, deas)

(3)

tá sí sa seomra folctha \n is she in room bath
nsubj(tá, sí)
xcomp(tá, seomra)

(4)

Is lei an leabhar\n Is with-her the book
nsubj(lei, leabhar)
cop(lei, Is)

(5)

is di féin an leabhar \n is for-her the book
nsubj(di, leabhar)
cop(di, Is)

There is no special construction for existential constructions in Irish, the examples in (6) and (6b) show the same structure as in (3).

(6)

tá bia ann \n is food there
nsubj(tá, bia)
xcomp:pred(tá, ann)

(6b)

tá bia sa chistin \n Is food in-the kitchen
nsubj(tá, bia)
xcomp:pred(tá, chistin)

Czech

The Czech verb být is used in all categories (1-6) and can be analyzed as copula everywhere except for pure existentials (without location) where no predicate other than the verb is available. Czech word order is free and it is not guaranteed that all constructions will come out as subject-copula-predicate. Reversed order (predicate-copula-subject) is less common but possible, even in the “equation” category (1). Czech is a pro-drop language which means that pronominal subjects are optional.

(1a)

(ona) je moje matka \n she is my mother
nsubj(matka, (ona))
nsubj(mother, she)
cop(matka, je)
cop(mother, is)

(1b)

faktem je , že ona je moje matka \n fact is , that she is my mother
cop(faktem, je-2)
cop(fact, is-11)
csubj(faktem, matka)
csubj(fact, mother)
nsubj(matka, ona)
nsubj(mother, she)
cop(matka, je-6)
cop(mother, is-15)

(2)

(ona) je milá \n she is nice
nsubj(milá, (ona))
nsubj(nice, she)
cop(milá, je)
cop(nice, is)

(3)

(ona) je v kuchyni \n she is in kitchen
nsubj(kuchyni, (ona))
nsubj(kitchen, she)
cop(kuchyni, je)
cop(kitchen, is)
case(kuchyni, v)
case(kitchen, in)

(4)

to je její \n it is hers
nsubj(její, to)
nsubj(hers, it)
cop(její, je)
cop(hers, is)

(5)

to je pro ni \n it is for her
nsubj(ni, to)
nsubj(her, it)
cop(ni, je)
cop(her, is)
case(ni, pro)
case(her, for)

(6a)

jídlo je \n food is
nsubj(je, jídlo)
nsubj(is, food)

(6b)

v kuchyni je jídlo \n in kitchen is food
nsubj(kuchyni, jídlo)
nsubj(kitchen, food)
cop(kuchyni, je)
cop(kitchen, is)
case(kuchyni, v)
case(kitchen, in)

Russian

In Russian, there is no copula verb in the present tense. In the future and past tenses, the verb быть “be” is used. The same analysis applies to categories (1-5).

(1a)

она моя мать \n she my mother
nsubj(мать, она)
nsubj(mother, she)

(1b)

When the second part is a clause, the demonstrative pronoun то must be inserted. As a result, we have a different syntactic structure with different analysis: the clause formally modifies a nominal represented by the demonstrative.

дело в том , что она моя мать \n fact in that , that she my mother
nsubj(том, дело)
nsubj(that-12, fact)
acl(том, мать)
acl(that-12, mother)
nsubj(мать, она)
nsubj(mother, she)

(2)

она милая \n she nice
nsubj(милая, она)
nsubj(nice, she)

(3)

она на кухне \n she in kitchen
nsubj(кухне, она)
nsubj(kitchen, she)
case(кухне, на)
case(kitchen, in)

(4)

это ее \n it hers
nsubj(ее, это)
nsubj(hers, it)

(5)

это для нее \n it for her
nsubj(нее, это)
nsubj(her, it)
case(нее, для)
case(her, for)

(6a)

The form есть is originally the 3rd person singular present indicative of the verb быть (which also functions as copula). However, in Modern Russian this form is used only in existential statements, in both numbers and all persons. Past and future existential statements still use normal forms of быть.

есть еда \n is food
nsubj(есть, еда)
nsubj(is, food)

(6b)

есть еда на кухне \n is food in kitchen
nsubj(есть, еда)
nsubj(is, food)
obl(есть, кухне)
obl(is, kitchen)
case(кухне, на)
case(kitchen, in)

Finnish

In Finnish the copula verb is olla “to be”.

Turkish

In Turkish, there are two copula verbs, i- and ol-. The “true” copula is i- which is defective, only having a limited number of tense forms (aorist and past), and cliticising. When a copula is needed in another tense, ol- is employed. However, if there is a form of i- then the equivalent form of ol- takes on the meaning “become”.

In the present tense, third person singular aorist non-formal then there is no overt suffix for third person singular. Unlike Russian, where the copula verb does not appear in any part of the present tense paradigm, in Turkish it appears in all persons except third person (compare 1a and 1b). This means that it is more like the nominative case in the paradigm (which also has a -Ø suffix, than like the Russian copula).

In Turkish (and indeed in most Turkic languages), existence is a syntactically different (see 6a and 6b), using an adjective var “existent”, and so gets a different structure.

(1a)

O benim annem . \n she my mother
nsubj(annem, O)
nsubj(mother, she)

(1a)

Ben senin baban -ım . \n I your father am
nsubj(baban, O)
cop(baban, -ım)
nsubj(father, I)

(1c)

O benim annem -di . \n she my mother was
nsubj(annem, O)
cop(annem, -di)
nsubj(mother, she)

(2)

O hoş . \n She nice
nsubj(hoş, O)
nsubj(nice, She)

(3)

O mutfakta . \n She kitchen-in .
nsubj(mutfakta, O)
nsubj(kitchen-in, She)

(4)

Bu onun . \n It hers .
nsubj(onun, Bu)
nsubj(hers, It)

(5)

Bu onun için . \n It hers for .
nsubj(onun, Bu)
case(onun, için)
nsubj(hers, It)
case(hers, for)

(6a) ~~~ sdparse Yemek var . \n Food existing . nsubj(var, Yemek) nsubj(existing, Food) ~~~

(6b) ~~~ sdparse Mutfakta yemek var . \n Kitchen-in food existing . nsubj(var, yemek) obl(var, Mutfakta) nsubj(existing, food) obl(existing, Kitchen-in) ~~~

Status quo

The languages in UD with the tokens which have the cop relation. The vast majority will need converting in light of the new guidelines.

Treebank Unique cop Top-5 lemmas[POS] with cop relation
UD-Galician 1112 121/de[ADP], 40/necesario[ADJ], 38/como[PRON], 24/posible[ADJ], 23/importante[ADJ]
UD-Dutch 253 2491/ben[AUX], 283/word[AUX], 91/vind[VERB], 73/blijf[AUX], 67/maak[VERB]
UD-Spanish 229 5136/ser[VERB], 353/estar[VERB], 78/llamado[VERB], 66/encontrar[VERB], 48/hacer[VERB]
UD-Arabic 216 384/كَان[VERB], 75/لَيس[VERB], 31/عَدّ[VERB], 27/اِعتَبَر[VERB], 25/زَال[VERB]
UD-Portuguese 135 2120/ser[VERB], 370/estar[VERB], 176/como[ADV], 91/ficar[VERB], 38/parecer[VERB]
UD-French 99 4878/être[VERB], 232/devenir[VERB], 91/appeler[VERB], 70/nommer[VERB], 51/rester[VERB]
UD-Greek 67 531/είμαι[VERB], 86/αποτελώ[VERB], 34/θεωρώ[VERB], 27/γίνομαι[VERB], 20/καθίσταμαι[VERB]
UD-Catalan 57 3609/ser[AUX], 810/estar[VERB], 722/ser[VERB], 136/cop[NOUN], 53/semblar[VERB]
UD-Polish 18 764/być[VERB], 98/to[VERB], 42/być[AUX], 17/stać[VERB], 12/stawać[VERB]
UD-Basque 15 1993/izan[VERB], 266/egon[VERB], 124/ukan[VERB], 31/izan[AUX], 20/ibili[VERB]
UD-German 11 4698/-[VERB], 86/-[NOUN], 31/-[ADJ], 27/-[ADP], 23/-[PROPN]
UD-Estonian 9 3373/olema[VERB], 37/ole[VERB], 29/tunduma[VERB], 5/paistma[VERB], 4/näima[VERB]
UD-Czech 6 20480/být[VERB], 110/bývat[VERB], 3/stát[VERB], 3/bývávat[VERB], 1/moci[VERB]
UD-Hungarian 6 92/van[VERB], 61/lesz[VERB], 11/lehet[VERB], 3/marad[VERB], 1/hoz[VERB]
UD-Bulgarian 5 1940/съм[VERB], 3/съм[AUX], 1/стана[VERB], 1/разпространявам-(се)[VERB], 1/докосна-(се)[VERB]
UD-Buryat 5 70/байха[VERB], 22/болохо[VERB], 2/ябаха[VERB], 2/үнгэхэ[VERB], 2/байха[AUX]
UD-Croatian 5 1236/biti[AUX], 1/željeti[VERB], 1/težiti[VERB], 1/davati[VERB], 1/bivati[VERB]
UD-English 4 5593/be[VERB], 8/`s[VERB], 5/be[AUX], 1/’[VERB]
UD-Kazakh 4 131/е[VERB], 42/бол[VERB], 1/тұр[VERB], 1/атан[VERB]
UD-Uyghur 4 66/-[VERB], 4/-[NOUN], 3/-[ADJ], 1/-[PART]
UD-Hindi 3 3014/है[VERB], 497/था[VERB], 1/बशर्ते[SCONJ]
UD-Irish 3 369/is[VERB], 3/is[PART], 1/má[SCONJ]
UD-Russian 3 538/-[VERB], 5/-[NOUN], 1/-[ADP]
UD-Russian-SynTagRus 3 4457/БЫТЬ[AUX], 622/ЭТО[NOUN], 4/ВОТ[PART]
UD-Chinese 2 1795/-[VERB], 8/-[ADJ]
UD-Coptic 2 30/ⲡⲉ[PART], 2/ⲡ[DET]
UD-Danish 2 1576/være[AUX], 185/blive[AUX]
UD-Hebrew 2 387/-[VERB], 7/-[PRON]
UD-Persian 2 4662/-[VERB], 3/-[ADJ]
UD-Turkish 2 751/i[AUX], 113/değil[VERB]
UD-Faroese 1 1081/vera[VERB]
UD-Finnish 1 3279/olla[VERB]
UD-Indonesian 1 1055/-[VERB]
UD-Italian 1 2767/essere[VERB]
UD_Norwegian 1 7217/være[VERB]
UD-Slovenian 1 2820/biti[VERB]
UD-Swedish 1 1629/vara[VERB]
UD-Tamil 1 1/முயல்[VERB]

UD-internal references

Further reading

For wider cross-linguistic applicability, it is well worth looking at the following book:

The following publications have also been cited: