home edit page issue tracker

This page still pertains to UD version 1.

Tokenization

White space always indicates a token boundary and punctuation constitute separate tokens, except:

The treebank does not contain multiword tokens.