Tokenization
White space always indicates a token boundary and punctuation constitute separate tokens, except:
- numbers with periods, commas or colons, e.g. 1.3, 0,6, 10:13
- abbreviations, e.g. f.eks., Carl J. Hambro
- URLs, e.g. http://www.ifi.uio.no
The treebank does not contain multiword tokens.