home edit page issue tracker

This page pertains to UD version 2.

UD for Alemannic (Swiss German and Alsatian)

Introduction

We mostly follow the documentation for German. This documentation describes the most important differences between Alemannic dialects and standard German which influence the annotation.

As Alemannic is a widely spread dialect group spanning several countries (Switzerland, France’s Alsace region, Germany’s Baden-Württemberg state, and Liechtenstein), there is a high degree of variation between the dialects, which can affect the annotation. Consequently, different treebanks correspond to different dialect varieties.

Alemannic-UZH (Zurich Swiss German)

As for German, words are generally delimited by white spaces. However, there is a lot more freedom in merging any words together, which can’t usually be split in an easy way. I.e. we use the German tokenization and introduce a separate tag for merged words (see meta tag TAG+ described further down).

The POS annotations are generally based on the German guidelines, namely the Stuttgart-Tübingen-TagSet (STTS) and some changes according to the TIGER annotation scheme. Furthermore, dealing with Swiss German, there is the need for an additional POS tag PTKINF, not present in the STTS tagset, as well as for the “meta tag” TAG+.

The Universal Dependency POS (UPOS) tags are converted according to the mapping provided by the Universal Dependency. Additionaly:

Please check the readme/GitHub repo of the treebank for further/current information.

Alemannic-DIVITAL (Alsatian)

Only the main differences are introduced below. Please check the annotation guidelines for more in depth information:

Syntax

Features

MISC attributes

UD for German

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Core Arguments, Oblique Arguments and Adjuncts

Non-verbal Clauses

Relations Overview

Treebanks

There are 2 Alemannic UD treebanks: