home edit page issue tracker

This page still pertains to UD version 1.

Tokenization

The tokenization of the UD Finnish treebank follows with only minor modifications the tokenization of the Turku Dependency Treebank (TDT), which is a straightforward whitespace-based tokenization with conventional separation of punctuation. The Finnish UD treebank does not contain multiword tokens.