This document describes specific issues we encountered when applying the schema of Universal Dependencies to Japanese syntax.
サ変 / Sahen verbs
Many verbs in Japanese have the form that a noun is followed by a verbal suffix する / suru “do”.
- 勉強 する / benkyo suru “study”
- 登校 する / toko suru “go to a school”
Since the first part (勉強, 登校) can be used as a noun (e.g. 勉強 が 好き / benkyo ga suki “(I) like studying”, the problem here is which word can be considered as a main verb. We have the following choices for this analysis.
勉強 する aux(勉強, する)
勉強 する nmod(する, 勉強)
The first analysis regards 勉強 as a main verb, and する as an auxiliary verb. The second analysis regards する as a main verb, and 勉強 as a modifier.
We choose the first analysis, mainly because the noun part carries semantic content, as well as syntactic frames.
Suffixes changing POS
A similar issue appears in other constructions. The suffix さ / sa changes an adjective into a noun, and っぽい / ppoi changes a noun into an adjective.
- かわい さ / kawai sa “cuteness”
- 春 っぽい / haru ppoi “spring-like”
We choose the analysis similar to the case of サ変, in which we regard the first content word as a head, and the suffix as a function word.
かわい さ mark(かわい, さ)
The distinction between main verbs and auxiliary verbs are unclear in some cases.
- 走っ た / hashit ta “ran”
- 走っ て いる / hashit te iru “running”
- 走っ て ほしい / hashit te hoshii “want (you) to run”
- 走り 始める / hashiri hajimeru “begin to run”
The first example is a clear case of an auxiliary verb, because た does not appear independently. The other cases are unclear, because the verbs like いる, ほしい, 始める can be used as a main verb. However, in the above examples, proper meanings of these verbs are lost (this is similar to a light verb) and auxiliary meanings are added to a preceding verb.
These verbs are defined as 非自立 hijiritsu in UniDic, and we define 非自立 verbs preceded by another verb as an auxiliary verb. If these verbs appear independently, they are regarded as a main verb.
太郎 が 走り 始める nsubj(走り, 太郎) aux(走り, 始める)
商売 を 始め た dobj(始め, 商売) aux(始め, た)
Dependency labels of Universal Dependencies are sensitive to the
distinction between a clause and a non-clause; e.g,
acl. However, it is not evident
what is “clause” in Japanese. In the case of the distinction between
csubj, we have the following gradation.
- 食べる の が 好き / taberu no ga suki “(I) like eating”
- 食べる こと が 好き / taberu koto ga suki “(I) like eating”
- 食べる ところ が 好き / taberu tokoro ga suki “(I) like where eating”
- 食べる まで が 好き / taberu made ga suki “(I) like (the situation) before eating”
The first one is a clear case, because の does not appear independently. This can be regarded as a complementizer. However, the following cases are not clear. こと, ところ are used as a noun, but in these examples they have light meanings. In particular, the second example has almost the same meaning as the first one. The last example, まで, is a function word, but in this case it adds an additional meaning.
In the current definition, we define the first case, i.e., a phrase introduced by の, is a clausal subject, while the other cases are regarded as a noun phrase.
食べる の が 好き mark(食べる, の) case(食べる, が) csubj(好き, 食べる)
食べる こと が 好き acl(こと, 食べる) case(こと, が) nsubj(好き, こと)
A similar issue appears for the distinction between
In Japanese, relative clauses do not accompany with a relativizer, and
a simple adjective-noun construction has no difference from a relative
- かわいい 人形 / kawaii ningyo “cute doll”
- とても かわいい 人形 / totemo kawaii ningyo “very cute doll”
- 服 が かわいい 人形 / fuku ga kawaii ningyo “a doll whose cloth is cute”
- かわいかっ た 人形 / kawaikat ta ningyo “a doll which was cute”
There is no clear boundary in these examples. A possible solution is
to regard everything as
acl, and never use
amod. However, this
analysis decreases the parallelism with other languages. Therefore,
we give the following definition.
amod: an adjective without any arguments (e.g.
nsubj) and auxiliary verbs (e.g. た / ta)
In the above examples, the first two cases are annotated as
while the others are as
かわいい 人形 amod(人形, かわいい)
服 が かわいい 人形 nsubj(かわいい, 服) acl(人形, かわいい)
This definition gives analyses mostly corresponding to English translations. However, this is not a linguistically justified definition and a better solution is necessary.
In Universal Dependencies, passive voice is marked with special
dependency labels like
auxpass. This is useful for
recognizing semantic dependencies. However, Japanese syntax involves
other voice that involves case alternations.
- causative: 太郎 が 次郎 に りんご を 食べ させる / Taro ga Jiro ni ringo o tabe saseru “Taro makes Jiro eat an apple”
- benefactive: 太郎 が 次郎 に りんご を 食べ て もらう / Taro ga Jiro ni ringo o tabe te morau “Taro asks Jiro to eat an apple”
The problem here is that auxiliary verbs like させる and もらう changes case markers (e.g. 次郎 is a subject of 食べ, but is marked with に). In addition, these constructions introduces an additional argument (太郎 in these cases), which is a causer in the first example and a benefactive in the second. We don’t have a method to indicate these case alternations in Universal Dependencies.
Currently, we give dependency labels based on surface expressions, without any markings of case alternations.
太郎 が 次郎 に りんご を 食べ させる nsubj(食べ, 太郎) dobj(食べ, りんご) iobj(食べ, 次郎) aux(食べ, させる)
太郎 が 次郎 に りんご を 食べ て もらう nsubj(食べ, 太郎) dobj(食べ, りんご) iobj(食べ, 次郎) aux(食べ, もらう)