This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home issue tracker

Tokenization

The tokenization of the UD Basque treebank follows the tokenization of the Basque Dependency Treebank (BDT), which is a straightforward whitespace-based tokenization with conventional separation of punctuation. The Basque UD treebank does not contain multiword tokens.