Tokenization
The tokenization in the Hungarian UD treebank follows the principles of the Szeged Dependency Treebank (Vincze et al. 2010). It does not contain multiword tokens.
References
Vincze, Veronika; Szauter, Dóra; Almási, Attila; Móra, György; Alexin, Zoltán; Csirik, János 2010: Hungarian Dependency Treebank. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta.