This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home issue tracker

Tokenization

The tokenization of the UD Finnish treebank follows with only minor modifications the tokenization of the Turku Dependency Treebank (TDT), which is a straightforward whitespace-based tokenization with conventional separation of punctuation. The Finnish UD treebank does not contain multiword tokens.