This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home v2/v2 issue tracker

Copula in UD v2

The treatment of copula constructions (non-verbal intransitive predication) is quite diverse in the current version of the treebanks (see table below for the status quo). In order to provide more concrete guidelines and to achieve better consistency cross-lingually and within a single language, we propose the following changes:

Problems with the current copula analysis

The main problem is the lack of standardisation. Leaving aside the Galician example, which appears to be a conversion error, the Spanish treebank has over 229 verbs with the cop relation, where the Swedish treebank has one.

Treebanks differ in if they treat the PP/case-marked nominal as head, in Swedish it is head, while in Finnish it is dependent:

There are also inconsistencies within a language, for example the existential construction with copula in English:

Compared to the bare copula:

We also do not provide a consistent analysis when one side of the copula is a clause:

Copula constructions in UDv2

For language-specific examples, see below, but here is a summary:

Nominals

The structure wil remain the same, but the relation will be changed to nsubj:cop:

When there are more than one PP, the head should be the least oblique argument/modifier according to relevant language-specific tests. For example:

The omission test could be used:

Only in cases where no tests apply should we resort to general heuristics such as “closest to the copula” and so on:

and:

Clausals

When there is a clausal predicate, then we make the head of that the head of the whole copula clause:

We distinguish copula subjects from non-copula subjects, so that when there is a clausal we do not get a double subject:

To discuss

However, we still get duplication of the cop relation where you have a copula on the right:

And in the case of having an expressed subject, we get two subjects for the main predicate:

Language-specific examples

For the purposes of demonstrating the new classification system a number of examples have been prepared for a range of UD languages. The examples are in English, but where they are ambiguous in a given language multiple variants will be given.

  1. She is a student
  2. I am a student
  3. She was a student
  4. I was a student
  5. She is happy
  6. I am happy
  7. She is in shape
  8. She is in the house
  9. I am in the house
  10. She was in the house
  11. There is a house in the village
  12. The house is in the village
  13. There was a house in the village
  14. The house was in the village

English

The English analysis more or less follows the analysis in the UD_English treebank, with the addition of the relation nsubj:cop for subjects of copula constructions. There is a difference however with how (11) and (13) are treated.

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

Swedish

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Example needed

(8)

(9)

(10)

(11)

Existential constructions in Swedish do not use the copula verb.

(12)

(13)

(14)

Spanish

The UD_Spanish treebank has very many verbs classified as copula. We propose reducing it to the single verb “ser”.

(1)

(2)

(3)

(4)

(5)

In Spanish you can say either Soy feliz “I am happy” or Estoy feliz “I am happy”/”I feel happy”. In the following examples, the subject pronouns are expressed to illustrate the difference in relation for the subject. They may equally well be dropped.

(6)

(7)

Instead of “in shape” we’ll use “de puta madre” which means “really great”,

(8)

In Spanish location/position uses the verb estar and not ser.

Note that in Catalan, this would be “Ella és a la casa”, using the ser verb, not the estar verb. This would be analysed as:

(9)

(10)

(11)

Existential constructions in Spanish do not use the copula verb.

(12)

(13)

(14)

Russian

In Russian, there is no copula verb in the present tense. In the future and past tenses, the verb быть “be” is used. Note that when the copula verb is used, the complement can be either in nominative or instrumental case. When it is instrumental it is is category of and when it is nominative it is more like has quality of. We propose using the same structure for both.

(1)

(2)

(3)

(4)

(5)

The same goes with adjectival uses:

(6)

(7)

Instead of “in shape”, we’ll use “в курсе” which means “on the ball”

(8)

In Russian, there is no verb used for locative predication in the present tense.

(9)

As with (8),

(10)

In the past tense we have the verb and we make it a dependent:

(11)

In Russian, in the present tense, existential constructions use “есть” which is sometimes described as a “predicative”:

(12)

(13)

In the past tense (and future tense), the verb быть is employed. Syntactically (13) and (14) are equivalent in Russian aside from the word order considerations.

(14)

Finnish

In Finnish the copula verb is olla “to be”. Its complement is typically in the nominative, although it may also be in the essive case -nA.

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

In Finnish, existential and non-existential are identical aside from word order.

(12)

(13)

(14)

Turkish

In Turkish, there are two copula verbs, i- and ol-. The “true” copula is i- which is defective, only having a limited number of tense forms (aorist and past), and cliticising. When a copula is needed in another tense, ol- is employed. However, if there is a form of i- then the equivalent form of ol- takes on the meaning “become”.

(1)

In the present tense, third person singular aorist non-formal then there is no overt suffix for third person singular. Unlike Russian, where the copula verb does not appear in any part of the present tense paradigm, in Turkish it appears in all persons except third person. This means that it is more like the nominative case in the paradigm (which also has a -Ø suffix, than like the Russian copula).

In the following examples the hyphen is used to separate cliticised syntactic words.

(2)

(3)

The copula verb here can also be written separately instead of cliticised in more formal styles,

(4)

(5)

(6)

(7)

Example needed

(8)

(9)

(10)

(11)

In Turkish (and indeed in most Turkic languages), existence is a syntactically different, using an adjective var “existent”, and so gets a different structure.

(12)

(13)

(14)

Irish

Irish has a difference between a copula verb “is” and what is called a substantive verb “bí”. Only the copula verb receives the cop relation. The substantive verb is head and takes an argument with xcomp. Teresa’s thesis has an in depth description of the treatment of the copula in Irish.

(1)

Example needed

(2)

Example needed

(3)

(4)

Not applicable.

(5)

Not applicable.

(6)

Example needed

(7)

Example needed

(8)

Example needed

(9)

Example needed

(10)

Example needed

(11)

Example needed

(12)

Example needed

(13)

Example needed

(14)

Example needed

Status quo

The languages in UD with the tokens which have the cop relation. If we adopt the above recommendations, the vast majority will need converting.

Treebank Unique cop Top-5 lemmas[POS] with cop relation
UD-Galician 1112 121/de[ADP], 40/necesario[ADJ], 38/como[PRON], 24/posible[ADJ], 23/importante[ADJ]
UD-Dutch 253 2491/ben[AUX], 283/word[AUX], 91/vind[VERB], 73/blijf[AUX], 67/maak[VERB]
UD-Spanish 229 5136/ser[VERB], 353/estar[VERB], 78/llamado[VERB], 66/encontrar[VERB], 48/hacer[VERB]
UD-Arabic 216 384/كَان[VERB], 75/لَيس[VERB], 31/عَدّ[VERB], 27/اِعتَبَر[VERB], 25/زَال[VERB]
UD-Portuguese 135 2120/ser[VERB], 370/estar[VERB], 176/como[ADV], 91/ficar[VERB], 38/parecer[VERB]
UD-French 99 4878/être[VERB], 232/devenir[VERB], 91/appeler[VERB], 70/nommer[VERB], 51/rester[VERB]
UD-Greek 67 531/είμαι[VERB], 86/αποτελώ[VERB], 34/θεωρώ[VERB], 27/γίνομαι[VERB], 20/καθίσταμαι[VERB]
UD-Catalan 57 3609/ser[AUX], 810/estar[VERB], 722/ser[VERB], 136/cop[NOUN], 53/semblar[VERB]
UD-Polish 18 764/być[VERB], 98/to[VERB], 42/być[AUX], 17/stać[VERB], 12/stawać[VERB]
UD-Basque 15 1993/izan[VERB], 266/egon[VERB], 124/ukan[VERB], 31/izan[AUX], 20/ibili[VERB]
UD-German 11 4698/-[VERB], 86/-[NOUN], 31/-[ADJ], 27/-[ADP], 23/-[PROPN]
UD-Estonian 9 3373/olema[VERB], 37/ole[VERB], 29/tunduma[VERB], 5/paistma[VERB], 4/näima[VERB]
UD-Czech 6 20480/být[VERB], 110/bývat[VERB], 3/stát[VERB], 3/bývávat[VERB], 1/moci[VERB]
UD-Hungarian 6 92/van[VERB], 61/lesz[VERB], 11/lehet[VERB], 3/marad[VERB], 1/hoz[VERB]
UD-Bulgarian 5 1940/съм[VERB], 3/съм[AUX], 1/стана[VERB], 1/разпространявам-(се)[VERB], 1/докосна-(се)[VERB]
UD-Buryat 5 70/байха[VERB], 22/болохо[VERB], 2/ябаха[VERB], 2/үнгэхэ[VERB], 2/байха[AUX]
UD-Croatian 5 1236/biti[AUX], 1/željeti[VERB], 1/težiti[VERB], 1/davati[VERB], 1/bivati[VERB]
UD-English 4 5593/be[VERB], 8/`s[VERB], 5/be[AUX], 1/’[VERB]
UD-Kazakh 4 131/е[VERB], 42/бол[VERB], 1/тұр[VERB], 1/атан[VERB]
UD-Uyghur 4 66/-[VERB], 4/-[NOUN], 3/-[ADJ], 1/-[PART]
UD-Hindi 3 3014/है[VERB], 497/था[VERB], 1/बशर्ते[SCONJ]
UD-Irish 3 369/is[VERB], 3/is[PART], 1/má[SCONJ]
UD-Russian 3 538/-[VERB], 5/-[NOUN], 1/-[ADP]
UD-Russian-SynTagRus 3 4457/БЫТЬ[AUX], 622/ЭТО[NOUN], 4/ВОТ[PART]
UD-Chinese 2 1795/-[VERB], 8/-[ADJ]
UD-Coptic 2 30/ⲡⲉ[PART], 2/ⲡ[DET]
UD-Danish 2 1576/være[AUX], 185/blive[AUX]
UD-Hebrew 2 387/-[VERB], 7/-[PRON]
UD-Persian 2 4662/-[VERB], 3/-[ADJ]
UD-Turkish 2 751/i[AUX], 113/değil[VERB]
UD-Faroese 1 1081/vera[VERB]
UD-Finnish 1 3279/olla[VERB]
UD-Indonesian 1 1055/-[VERB]
UD-Italian 1 2767/essere[VERB]
UD_Norwegian 1 7217/være[VERB]
UD-Slovenian 1 2820/biti[VERB]
UD-Swedish 1 1629/vara[VERB]
UD-Tamil 1 1/முயல்[VERB]

UD-internal references

Further reading

For wider cross-linguistic applicability, it is well worth looking at the following book:

The following publications have also been cited:

BESbswyBESbswyBESbswyBESbswy