This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home issue tracker

Tokenization

White space always indicates a token boundary and punctuation constitute separate tokens, except:

The treebank does not contain multiword tokens.