This is part of archived UD v1 documentation. See for the current version.
home issue tracker


The tokenization of the UD Basque treebank follows the tokenization of the Basque Dependency Treebank (BDT), which is a straightforward whitespace-based tokenization with conventional separation of punctuation. The Basque UD treebank does not contain multiword tokens.