UD for Low Saxon
Introduction
A part of this documentation is copied from the current German documentation UD for German, which can mostly be applied to Low Saxon as well.
Since there is no official interregional spelling for Low Saxon, the examples provided here are given in the interregional spelling used e.g. by the Dutch Low Saxon Wikipedia (Nysassiske Skryvwyse, described in more detail here: https://skryvwyse.eu/ (only in Low Saxon)) and lemma forms are given in both the Nysassiske Skryvwyse and normalised Middle Low Saxon following the Mittelniederdeutsches Handwörterbuch by Agathe Lasch et al.
Tokenization and Word Segmentation
- In general, words are delimited by whitespace characters. Description of exceptions to this rule follows.
- According to typographical rules, many punctuation marks are attached to a neighboring word. We usually tokenize them as separate tokens (words).
- Low Saxon compounds are written as one word and we do not split them.
- There are classes of multi-word tokens such as contractions of prepositions and definite articles, as well as contractions of verbs and a (clitic) pronoun. Examples: hek = hev + ik “I have”, im = in + dem “in the”
Morphology
Tags
- Low Saxon uses all 17 universal POS categories, including particles (PART).
- The following words are particles in Low Saxon: nich/nicht¹ “not”, and the infinitive marker to/tô⁵ “to”.
- The pronoun (PRON) vs. determiner (DET) distinction is based on word lists because the traditional grammar does not define determiners. In general, words that inflect for gender such that they agree with a modified noun, are tagged DET, even if they act independently in a given sentence; this includes possessives.
- Low Saxon auxiliary verbs (AUX) are:
- weasen/wēsen² for the perfect tenses of some verbs (ik bün koamen “I have come”) and as copula (hee is old “he is old”)
- hebben for the perfect tenses of the remaining verbs (ik hev eaten “I have eaten”)
- werden/wērden¹ for the passive (dat wardt eaten “it is (being) eaten”)
- sköälen/schȫlen¹, willen/willen¹ and werden/wērden¹ for the future tense (ik skal binnenkört dår weasen “I will arrive soon”)
- modal verbs dörven “may”, künnen “can”, möägen/mȫgen “may, want”, möten/mö̂ten² “must”, sköälen/schȫlen¹ “shall”, willen/willen¹ “want”
- doon/dôn¹, willen/willen¹ and werden/wērden¹ for a periphrastic conditional (see dea em lever besöken “she would prefer to visit him“)
- The verbs weasen/wēsen², hebben, doon/dôn¹ and werden/wērden¹ can also occur as normal verbs (VERB), meaning “be, have, do, become”.
- There are four main (de)verbal forms, distinguished by the UPOS tag and the value of the VerbForm feature:
Features
Nominal Features
- Nominal words (NOUN, PROPN and PRON) have an inherent Gender feature with one of two or three values:
Masc
,Fem
orNeut
. Most dialects preserve three genders, while in some,Masc
andFem
have merged. - The two main values of the Number feature are
Sing
andPlur
. The following parts of speech inflect for number: NOUN, PROPN, PRON, ADJ, DET, VERB, AUX (finite and participles). - The number of values for Case depends on the dialect. Few dialects have preserved four cases:
Nom
,Gen
,Dat
,Acc
. Most dialects do not distinguish dative and accusative anymore and thus only know three cases:Nom
,Gen
,Acc
. Some dialects have also merged the nominative and accusative and therefore only two cases remain:Nom
,Gen
. Case occurs with the nominal words, i.e., NOUN, PROPN, PRON, ADJ, DET. However, case forms of nouns are extremely ambiguous and most of the time the case is distinguished only by the form of the article. - Definite has 2 values:
Ind
,Def
. It is used to distinguish the indefinite and definite articles (DET).
Degree and Polarity
- Degree applies to adjectives (ADJ) and adverbs (ADV) and has one of three possible values:
Pos
,Cmp
,Sup
. - Polarity is used to mark the negative particle nich/nicht¹, i.e., only the
Neg
value is used.
Verbal Features
- Finite verbs always have one of two values of Mood:
Ind
orImp
. Some dialects have preserved separate forms forSub
(called konjunktiv in Low Saxon). - Indicative and subjunctive verbs always have one of two values of Tense:
Past
,Pres
.- In the subjunctive mood, the tense feature is used to distinguish konjunktiv I (
Pres
) and konjunktiv II (Past
). - Imperative forms do not have the
Tense
feature. - The
Tense
feature is also used to distinguish present and past participles (singen(d) “singing” vs. sungen “sung”).
- In the subjunctive mood, the tense feature is used to distinguish konjunktiv I (
- In the plural, verbs do not commonly distinguish person and consequently are only tagged for
Plur
. Some dialects may show occasional exceptions to this rule in particular verbs (maybe due to influence from German or Dutch?), in which case the person should be tagged. - The features Aspect and Voice are not used in Low Saxon because both the perfect aspect and the passive voice are expressed periphrastically.
Pronouns, Determiners, Quantifiers
- PronType is used with pronouns (PRON), determiners (DET) and adverbs (ADV).
- NumType is used with numerals (NUM), adjectives (ADJ) and determiners (DET).
- The Poss feature marks possessive personal determiners (e.g. myn/mîn² “my”).
- The Reflex feature is always used together with
PronType=Prs
and it marks reflexive pronouns (my/mî, dy/dî, sik/sik¹, uns/uns², ju/iü²). Note that their forms in the first and second person are ambiguous with irreflexive accusative forms, and theReflex
feature must be decided by context. - Person is a lexical feature of personal pronouns (PRON) and has three values,
1
,2
and3
. With personal possessive determiners (DET), the feature actually encodes the person of the possessor. Person is not marked on other types of pronouns and on nouns, although they can be almost always interpreted as the 3rd person. - The Polite feature distinguishes informal second-person pronouns (du/dû¹, jy/gî²,
Polite=Infm
) from the formal Jy/gî² and See/sê² (Polite=Form
). The formal pronoun Jy/gî² is phonologically equivalent in all its case forms to the second-person plural Jy/gî², and the formal pronoun See/sê² is phonologically equivalent in part of its case forms to the third-person plural see/sê², but they are distinguished in orthography by the capital letters J and S. We tag the formal pronoun See/sê² as second person (because this is its meaning) and we do not tag formal pronouns for number (because they are used both for singular and plural addressees), despite the fact that they combine with plural verbs. The parser must learn thatPerson=2|Polite=Form
subject attaches toNumber=Plur
verbs, whileNumber=Sing|Person=2|Polite=Infm
subject attaches toNumber=Sing|Person=2
verbs.
Other Features
Syntax
Core Arguments, Oblique Arguments and Adjuncts
- A nominal subject (nsubj) is a noun phrase in the nominative case without a preposition.
- A finite subordinate clause may serve as the subject and is labeled
csubj
. - If a verb is to serve as the subject, it becomes a verbal noun
(its form resembles the infinitive, or especially in older variants of the language the present participle, but it gets the neuter singular nominative article),
thus it is labeled
nsubj
.
- A finite subordinate clause may serve as the subject and is labeled
- Objects defined in the Low Saxon grammar may be bare noun phrases in accusative, and in dialects which have preserved the dative-accusative distinction, a dative object is possible as well. Bare genitive phrases do not generally occur as objects in the modern language anymore.
Prepositional phrases in accusative (or also dative, in some dialects) can function as objects as well.
For the purpose of UD the objects are divided to core objects, labeled obj or iobj,
and oblique objects, labeled obl:arg.
- Bare accusative (and dative) objects are considered core.
- All prepositional objects are considered oblique.
- Accusative objects of some verbs alternate with finite clausal complements, which are labeled ccomp.
- If a verb subcategorizes for the infinitive (e.g. phasal verbs or verbs of control), the infinitival complement is labeled xcomp.
- If a verb subcategorizes for two core objects, where one of them accusative (or
ccomp
) and the other non-accusative, then the non-accusative object is labeled iobj. Core nominal objects in other situations are labeled just obj.
- Adjuncts (or, following the German grammar, adverbial modifiers realized as noun phrases) are usually
prepositional phrases, but they can be bare noun phrases as well. They are labeled obl:
- Temporal modifiers realized as accusative noun phrases: ik werke elken dag “I work every day.”
- All prepositional phrases that are not prepositional objects (i.e., their role and form is not defined lexically by the predicate) are adjuncts.
- Extra attention has to be paid to the reflexive pronoun sik. It can function as:
- A core object (obj or iobj): hee sügt sik in’n spegel “he sights himself in the mirror.”
- A reciprocal core object (
obj
oriobj
): see küsset sik “they are kissing each other.” - Inherently reflexive verbs cannot exist without the reflexive clitic, and the clitic cannot be substituted by an irreflexive pronoun
or a noun phrase.
In accordance with the current UD guidelines, we label the relation
between the verb and the clitic as expl:pv, not
compound
. Example: wy mussen uns spoden “we had to hurry.”
- In passive clauses, the subject is labeled with nsubj:pass or csubj:pass, respectively.
Non-verbal Clauses
- The copula verb weasen/wēsen² (be) is used in existential, equational, attributional, locative, possessive and benefactory nonverbal clauses.
- Existential clauses, especially in dialects from the German side, may also use a different verb, geaven/gēven (give) with an accusative object: dat givt eaten “there is food.”
Relations Overview
- The following relation subtypes are used in Low Saxon:
- nsubj:pass for nominal subjects of passive verbs
- csubj:pass for clausal subjects of passive verbs
- obl:agent for agents of passive verbs
- obl:arg for prepositional objects
- expl:pv for reflexive clitics of inherently reflexive verbs
- aux:pass for passive auxiliaries
- compound:prt for separable verb prefixes
- det:poss for possessive determiners
- nmod:poss for possessive modifier phrases
Treebanks
There are N Low Saxon UD treebanks: