Morphology
As described in Introduction, the parts-of-speech of Japanese are defined as a mapping from UniDic POS tags, because the UniDic guideline is fully established and widely used in Japanese NLP.
The following table defines a mapping from UniDic SUW POS tags into Universal Dependencies POS tags (this table is not finalized yet; any suggestions are welcome).
UD POS | UniDic SUW POS |
---|---|
ADJ | 形容詞(adjective), 連体詞(adnomial), 形状詞(adjectival noun) |
ADV | 副詞(adverb) |
INTJ | 感動詞(interjection) |
NOUN | 名詞-普通名詞(common noun), 接頭辞(prefix), 接尾辞(suffix) |
PROPN | 名詞-固有名詞(proper noun) |
VERB | 動詞(verb) |
ADP | 助詞-格助詞(case particle), 助詞-係助詞(binding particle) |
AUX | 助動詞(auxiliary verb) |
CONJ | 接続詞(conjunction), 助詞-格助詞(case particle) |
DET | 連体詞(adnomial) |
NUM | 名詞-数詞(numeral noun) |
PART | 助詞-副助詞(adverbial particle), 助詞-終助詞(phrase final particle) |
PRON | 代名詞(pronoun) |
SCONJ | 助詞-接続助詞(conjunctive particle), 助詞-準体助詞(nominal particle) |
PUNCT | 補助記号(supplementary symbol) |
SYM | 記号(symbol), 補助記号(supplementary symbol) |
X | 空白(white space) |
Several UniDic POS tags are mapped into different UD POS tags depending on additional information like lemmas, and/or syntactic context.
- 連体詞(adnomial): demonstrative determiners (e.g. この/this) are DET, while other adnomials are ADJ
- 名詞-普通名詞(common noun)
- 名詞-普通名詞-サ変可能(can be “suru” verbal): tagged as VERB if it is used as a verb (typically accompanied with する/suru); tagged as NOUN otherwise.
- 名詞-普通名詞-形状詞可能(can be adjectival): tagged as ADJ if it is used as an adjective; tagged as NOUN otherwise.
- 動詞-非自立可能(verb - can be functional), 形容詞-非自立可能 (adjective - can be functional): tagged as AUX if used as a functional word (typically preceded by another verb/adjective); tagged as VERB/ADJ otherwise.
- 助詞-格助詞(case particle): noun conjunctions (と/to, か/ka) are CONJ, while others are ADP.
- 補助記号(supplementary symbol): period, comma, open/close bracket are tagged PUNCT, while others are SYM.
Currently, we do not use features in Japanese.