Copula in UD v2
The treatment of copula constructions (non-verbal intransitive predication) is quite diverse in the current version of the treebanks (see table below for the status quo). In order to provide more concrete guidelines and to achieve better consistency cross-lingually and within a single language, we propose the following changes:
- We should be maximally restrictive with respect to which words can be copulas (only one word in most languages)
- The copula word should never be the root, except through promotion (“he is not happy, but she is”)
- When there is more than one possible candidate head, the rules to establish it should be determined on a language-specific basis
- We should add the subtype
nsubj:cop
to signal that the subject in copula constructions is special, and to partially solve the problem of having to flip dependencies when the predicate is a clause (see below)
Problems with the current copula analysis
The main problem is the lack of standardisation. Leaving aside the Galician example, which appears to be a conversion error,
the Spanish treebank has over 229 verbs with the cop
relation, where the Swedish treebank has one.
Treebanks differ in if they treat the PP/case-marked nominal as head, in Swedish it is head, while in Finnish it is dependent:
There are also inconsistencies within a language, for example the existential construction with copula in English:
Compared to the bare copula:
We also do not provide a consistent analysis when one side of the copula is a clause:
Copula constructions in UDv2
For language-specific examples, see below, but here is a summary:
Nominals
The structure wil remain the same, but the relation will be changed to nsubj:cop
:
When there are more than one PP, the head should be the least oblique argument/modifier according to relevant language-specific tests. For example:
The omission test could be used:
- She was in Prague
- *She was on Tuesday
Only in cases where no tests apply should we resort to general heuristics such as “closest to the copula” and so on:
and:
Clausals
When there is a clausal predicate, then we make the head of that the head of the whole copula clause:
We distinguish copula subjects from non-copula subjects, so that when there is a clausal we do not get a double subject:
To discuss
However, we still get duplication of the cop
relation where you have a copula on the right:
And in the case of having an expressed subject, we get two subjects for the main predicate:
Language-specific examples
For the purposes of demonstrating the new classification system a number of examples have been prepared for a range of UD languages. The examples are in English, but where they are ambiguous in a given language multiple variants will be given.
- She is a student
- I am a student
- She was a student
- I was a student
- She is happy
- I am happy
- She is in shape
- She is in the house
- I am in the house
- She was in the house
- There is a house in the village
- The house is in the village
- There was a house in the village
- The house was in the village
English
The English analysis more or less follows the analysis in the UD_English
treebank, with the addition of the relation nsubj:cop
for subjects of copula constructions. There is a difference however with how (11) and (13) are treated.
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
Swedish
(1)
(2)
(3)
(4)
(5)
(6)
(7)
Example needed
(8)
(9)
(10)
(11)
Existential constructions in Swedish do not use the copula verb.
(12)
(13)
(14)
Spanish
The UD_Spanish
treebank has very many verbs classified as copula. We propose reducing it to the single verb “ser”.
(1)
(2)
(3)
(4)
(5)
In Spanish you can say either Soy feliz “I am happy” or Estoy feliz “I am happy”/”I feel happy”. In the following examples, the subject pronouns are expressed to illustrate the difference in relation for the subject. They may equally well be dropped.
(6)
(7)
Instead of “in shape” we’ll use “de puta madre” which means “really great”,
(8)
In Spanish location/position uses the verb estar and not ser.
Note that in Catalan, this would be “Ella és a la casa”, using the ser verb, not the estar verb. This would be analysed as:
(9)
(10)
(11)
Existential constructions in Spanish do not use the copula verb.
(12)
(13)
(14)
Russian
In Russian, there is no copula verb in the present tense. In the future and past tenses, the verb быть “be” is used.
Note that when the copula verb is used, the complement can be either in nominative or instrumental case.
When it is instrumental it is is category of
and when it is nominative it is more like has quality of
. We propose using the same structure for both.
(1)
(2)
(3)
(4)
(5)
The same goes with adjectival uses:
(6)
(7)
Instead of “in shape”, we’ll use “в курсе” which means “on the ball”
(8)
In Russian, there is no verb used for locative predication in the present tense.
(9)
As with (8),
(10)
In the past tense we have the verb and we make it a dependent:
(11)
In Russian, in the present tense, existential constructions use “есть” which is sometimes described as a “predicative”:
(12)
(13)
In the past tense (and future tense), the verb быть is employed. Syntactically (13) and (14) are equivalent in Russian aside from the word order considerations.
(14)
Finnish
In Finnish the copula verb is olla “to be”. Its complement is typically in the nominative, although it may also be in the essive case -nA.
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
In Finnish, existential and non-existential are identical aside from word order.
(12)
(13)
(14)
Turkish
In Turkish, there are two copula verbs, i- and ol-. The “true” copula is i- which is defective, only having a limited number of tense forms (aorist and past), and cliticising. When a copula is needed in another tense, ol- is employed. However, if there is a form of i- then the equivalent form of ol- takes on the meaning “become”.
(1)
In the present tense, third person singular aorist non-formal then there is no overt suffix for third person singular. Unlike Russian, where the copula verb does not appear in any part of the present tense paradigm, in Turkish it appears in all persons except third person. This means that it is more like the nominative case in the paradigm (which also has a -Ø suffix, than like the Russian copula).
In the following examples the hyphen is used to separate cliticised syntactic words.
(2)
(3)
The copula verb here can also be written separately instead of cliticised in more formal styles,
(4)
(5)
(6)
(7)
Example needed
(8)
(9)
(10)
(11)
In Turkish (and indeed in most Turkic languages), existence is a syntactically different, using an adjective var “existent”, and so gets a different structure.
(12)
(13)
(14)
Irish
Irish has a difference between a copula verb “is” and what is called a substantive verb “bí”. Only the copula verb receives the cop
relation. The substantive verb is head and takes an argument with xcomp
.
Teresa’s thesis has an in depth description of the treatment of the copula in Irish.
(1)
Example needed
(2)
Example needed
(3)
(4)
Not applicable.
(5)
Not applicable.
(6)
Example needed
(7)
Example needed
(8)
Example needed
(9)
Example needed
(10)
Example needed
(11)
Example needed
(12)
Example needed
(13)
Example needed
(14)
Example needed
Status quo
The languages in UD with the tokens which have the cop
relation. If we adopt the above recommendations, the vast majority will need converting.
Treebank | Unique cop |
Top-5 lemmas[POS] with cop relation |
---|---|---|
UD-Galician | 1112 | 121/de[ADP], 40/necesario[ADJ], 38/como[PRON], 24/posible[ADJ], 23/importante[ADJ] |
UD-Dutch | 253 | 2491/ben[AUX], 283/word[AUX], 91/vind[VERB], 73/blijf[AUX], 67/maak[VERB] |
UD-Spanish | 229 | 5136/ser[VERB], 353/estar[VERB], 78/llamado[VERB], 66/encontrar[VERB], 48/hacer[VERB] |
UD-Arabic | 216 | 384/كَان[VERB], 75/لَيس[VERB], 31/عَدّ[VERB], 27/اِعتَبَر[VERB], 25/زَال[VERB] |
UD-Portuguese | 135 | 2120/ser[VERB], 370/estar[VERB], 176/como[ADV], 91/ficar[VERB], 38/parecer[VERB] |
UD-French | 99 | 4878/être[VERB], 232/devenir[VERB], 91/appeler[VERB], 70/nommer[VERB], 51/rester[VERB] |
UD-Greek | 67 | 531/είμαι[VERB], 86/αποτελώ[VERB], 34/θεωρώ[VERB], 27/γίνομαι[VERB], 20/καθίσταμαι[VERB] |
UD-Catalan | 57 | 3609/ser[AUX], 810/estar[VERB], 722/ser[VERB], 136/cop[NOUN], 53/semblar[VERB] |
UD-Polish | 18 | 764/być[VERB], 98/to[VERB], 42/być[AUX], 17/stać[VERB], 12/stawać[VERB] |
UD-Basque | 15 | 1993/izan[VERB], 266/egon[VERB], 124/ukan[VERB], 31/izan[AUX], 20/ibili[VERB] |
UD-German | 11 | 4698/-[VERB], 86/-[NOUN], 31/-[ADJ], 27/-[ADP], 23/-[PROPN] |
UD-Estonian | 9 | 3373/olema[VERB], 37/ole[VERB], 29/tunduma[VERB], 5/paistma[VERB], 4/näima[VERB] |
UD-Czech | 6 | 20480/být[VERB], 110/bývat[VERB], 3/stát[VERB], 3/bývávat[VERB], 1/moci[VERB] |
UD-Hungarian | 6 | 92/van[VERB], 61/lesz[VERB], 11/lehet[VERB], 3/marad[VERB], 1/hoz[VERB] |
UD-Bulgarian | 5 | 1940/съм[VERB], 3/съм[AUX], 1/стана[VERB], 1/разпространявам-(се)[VERB], 1/докосна-(се)[VERB] |
UD-Buryat | 5 | 70/байха[VERB], 22/болохо[VERB], 2/ябаха[VERB], 2/үнгэхэ[VERB], 2/байха[AUX] |
UD-Croatian | 5 | 1236/biti[AUX], 1/željeti[VERB], 1/težiti[VERB], 1/davati[VERB], 1/bivati[VERB] |
UD-English | 4 | 5593/be[VERB], 8/`s[VERB], 5/be[AUX], 1/’[VERB] |
UD-Kazakh | 4 | 131/е[VERB], 42/бол[VERB], 1/тұр[VERB], 1/атан[VERB] |
UD-Uyghur | 4 | 66/-[VERB], 4/-[NOUN], 3/-[ADJ], 1/-[PART] |
UD-Hindi | 3 | 3014/है[VERB], 497/था[VERB], 1/बशर्ते[SCONJ] |
UD-Irish | 3 | 369/is[VERB], 3/is[PART], 1/má[SCONJ] |
UD-Russian | 3 | 538/-[VERB], 5/-[NOUN], 1/-[ADP] |
UD-Russian-SynTagRus | 3 | 4457/БЫТЬ[AUX], 622/ЭТО[NOUN], 4/ВОТ[PART] |
UD-Chinese | 2 | 1795/-[VERB], 8/-[ADJ] |
UD-Coptic | 2 | 30/ⲡⲉ[PART], 2/ⲡ[DET] |
UD-Danish | 2 | 1576/være[AUX], 185/blive[AUX] |
UD-Hebrew | 2 | 387/-[VERB], 7/-[PRON] |
UD-Persian | 2 | 4662/-[VERB], 3/-[ADJ] |
UD-Turkish | 2 | 751/i[AUX], 113/değil[VERB] |
UD-Faroese | 1 | 1081/vera[VERB] |
UD-Finnish | 1 | 3279/olla[VERB] |
UD-Indonesian | 1 | 1055/-[VERB] |
UD-Italian | 1 | 2767/essere[VERB] |
UD_Norwegian | 1 | 7217/være[VERB] |
UD-Slovenian | 1 | 2820/biti[VERB] |
UD-Swedish | 1 | 1629/vara[VERB] |
UD-Tamil | 1 | 1/முயல்[VERB] |
UD-internal references
- http://universaldependencies.org/u/dep/cop.html
- https://github.com/UniversalDependencies/docs/issues/329
- http://universaldependencies.org/2015-08-23-uppsala/copula.html
- https://github.com/UniversalDependencies/docs/issues/256
Further reading
For wider cross-linguistic applicability, it is well worth looking at the following book:
- Stassen, L. (1997), Intransitive predication. Oxford: OUP
The following publications have also been cited:
- Hengeveld, K. (1992), Non-verbal Predication. Berlin & NewYork: Mouton de Gruyter.
- Katz, A. (1996) Cyclical Grammaticalization and the Cognitive Link between Pronoun and Copula. PhD Thesis, Rice University.
- Pustet, R. (2003), Copulas. Universals in the Categorization of the Lexicon. Oxford: OUP.