This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.

home issue tracker

Tokenization

The tokenization in the Hungarian UD treebank follows the principles of the Szeged Dependency Treebank (Vincze et al. 2010). It does not contain multiword tokens.

References

Vincze, Veronika; Szauter, Dóra; Almási, Attila; Móra, György; Alexin, Zoltán; Csirik, János 2010: Hungarian Dependency Treebank. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta.