home edit page issue tracker

This page pertains to UD version 2.

UD for Swiss German

Introduction

This is a copy of the current German documentation UD for German, which we generally follow for Swiss German. This introduction explains the most important differences which influence the annotation.

Please check the readme/GitHub repo of the GSW treebank for further/current information.

Differences to German UD Guidelines

As for German, words are generally delimited by white spaces. However, there is a lot more freedom in merging any words together, which can’t usually be split in an easy way. I.e. we use the German tokenization and introduce a separate tag for merged words (see meta tag TAG+ described further down).

The POS annotations are generally based on the German guidelines, namely the Stuttgart-Tübingen-TagSet (STTS) and some changes according to the TIGER annotation scheme. Furthermore, dealing with Swiss German, there is the need for an additional POS tag PTKINF, not present in the STTS tagset, as well as for the “meta tag” TAG+.

The Universal Dependency POS (UPOS) tags are converted according to the mapping provided by the Universal Dependency. Additionaly:

UD for German

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Core Arguments, Oblique Arguments and Adjuncts

Non-verbal Clauses

Relations Overview

Treebanks (Swiss German)

There is one Swiss German UD treebank: