home edit page issue tracker

This page still pertains to UD version 1.

Tokenization

The tokenization of the UD Korean Treebank follows the tokenization of the Korean data distributed by the SPMRL 2013 shared task, which is a straightforward whitespace-based tokenization with conventional separation of punctuation. Each token can contain one or more morphemes separated by plus (+) signs.