This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.

home issue tracker

Tokenization

The tokenization of the UD Korean Treebank follows the tokenization of the Korean data distributed by the SPMRL 2013 shared task, which is a straightforward whitespace-based tokenization with conventional separation of punctuation. Each token can contain one or more morphemes separated by plus (+) signs.