The Finnish UD treebank is based on the Turku Dependency Treebank (TDT), created at the University of Turku. The treebank consists of 15,000 sentences (200,000 tokens) and covers 10 different genres ranging from news to fiction and blog entries.
The morphological and syntactic annotation of the Finnish UD treebank is created through a conversion of TDT data, and much of the Finnish UD documentation draws directly on the TDT annotation guidelines (Haverinen et al. 2013).
We wish to thank all of the contributors to the original TDT annotation effort, including Katri Haverinen who led the annotation, Jenna Kanerva, Filip Ginter, Timo Viljanen, Veronika Laippala, Samuel Kohonen, Anna Missilä, Stina Ojala, and Tapio Salakoski, as well as the University of Turku, the Turku Centre for Computer Science, the Finnish Academy and the Turku University Foundation for supporting the original TDT annotation effort.
The University of Helsinki provides a different Finnish treebank, converted to the UD notation from a newly revised
FinnTreeBank 1 (ftb1-2014.zip, beta). The 19089 sentences and fragments originate as grammatical examples in the VISK Finnish grammar reference (161906 tokens, sentence lengths from 1 to 72 tokens with quartiles 5, 7, 11). This treebank is distributed as
- Sampo Pyysalo, Jenna Kanerva, Anna Missilä, Veronika Laippala, and Filip Ginter. 2015. Universal Dependencies for Finnish. In Proceedings of Nodalida 2015.
- Katri Haverinen, Jenna Nyblom, Timo Viljanen, Veronika Laippala, Samuel Kohonen, Anna Missilä, Stina Ojala, Tapio Salakoski, and Filip Ginter. 2013. Building the essential resources for Finnish: the Turku Dependency Treebank. Language Resources and Evaluation. Volume 48, Issue 3, pp 493-531.
- Katri Haverinen. 2013. Syntax Annotation Guidelines for the Turku Dependency Treebank - 2nd edition, revised for the treebank release of July 2013. Technical report 1034, Turku Centre for Computer Science.