home edit page issue tracker

This page pertains to UD version 2.

UD Swedish Talbanken

Language: Swedish (code: sv)
Family: Indo-European, Germanic

This treebank has been part of Universal Dependencies since the UD v1.0 release.

The following people have contributed to making this treebank part of UD: Joakim Nivre, Aaron Smith.

Repository: UD_Swedish-Talbanken
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.5

License: CC BY-SA 4.0

Genre: news, nonfiction

Questions, comments? General annotation questions (either Swedish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [joakim • nivre (æt) lingfil • uu • se]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas assigned by a program, with some manual corrections, but not a full manual verification
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
Relations annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion

Description

The Swedish-Talbanken treebank is based on Talbanken, a treebank developed at Lund University in the 1970s.

The Swedish-Talbanken treebank is a conversion of the Prose section of Talbanken (Einarsson, 1976), originally annotated by a team led by Ulf Teleman at Lund University according to the MAMBA annotation scheme (Teleman, 1974). It consists of roughly 6,000 sentences and 95,000 tokens taken from a variety of informative text genres, including textbooks, information brochures, and newspaper articles. The syntactic annotation is converted directly from the original MAMBA annotation, while the morphological annotation is based on the reannotation performed when incorporating Talbanken into the Swedish Treebank (Nivre and Megyesi, 2007). Tokenization mostly follows the standard of the Stockholm-Umeå Corpus, Version 2.0 (2006), and lemmatization is based on Saldo (Borin et al., 2008).

Acknowledgments

The new conversion has been performed by Joakim Nivre and Aaron Smith at Uppsala University. We thank everyone who has been involved in previous conversion efforts at Växjö University and Uppsala University, including Bengt Dahlqvist, Sofia Gustafson-Capkova, Johan Hall, Anna Sågvall Hein, Beáta Megyesi, Jens Nilsson, and Filip Salomonsson. Special thanks also to Lars Borin and Markus Forsberg at Språkbanken for help with the lemmatization. Finally, we owe a huge debt to the team who produced the original treebank in the 1970s.

References

Statistics of UD Swedish Talbanken

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERB

Features

AbbrCaseDefiniteDegreeForeignGenderMoodNumberNumTypePolarityPossPronTypeTenseVerbFormVoice

Relations

aclacl:cleftacl:relcladvcladvmodamodapposauxaux:passcaseccccompcompoundcompound:prtconjcopcsubjcsubj:passdetdiscoursedislocatedexplfixedflat:nameiobjlistmarknmodnmod:possnsubjnsubj:passnummodobjoblobl:agentorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview