UD for Gothic
Tokenization and Word Segmentation
- In general, words are delimited by whitespace characters.
- Punctuation such as commas and periods is not included in the data. Occasionally a punctuation symbol (typically a hyphen) is part of a word, as in at-gaf, þat-ain, ga-seƕi.
- There are no multi-word tokens.
- There are no words with spaces.
Morphology
Tags
- Gothic uses 14 universal POS categories. There are no particles, punctuation and other symbols in the data.
- The only auxiliary verb (AUX) in Gothic is wisan “to be”. It is used as copula (ni aiw swa uskunþ was in Israela “it was never so seen in Israel”).
- There are three main (de)verbal forms, distinguished by the value of the VerbForm feature:
Nominal Features
- Nominal words (NOUN, PROPN and PRON) have an inherent Gender feature with one of three values:
Masc
,Fem
orNeut
. - The three values of the Number feature are
Sing
,Dual
, andPlur
. The following parts of speech inflect for number: NOUN, PROPN, PRON, ADJ, DET, NUM, VERB, AUX (finite and participles). The dual number is distinguished for pronouns and verbs (including auxiliary) but not for nouns, adjectives or determiners. - Case has 5 possible values:
Nom
,Gen
,Dat
,Acc
,Voc
. It occurs with the nominal words, i.e., NOUN, PROPN, PRON, ADJ, DET, NUM. For verbs (VERB) and auxiliaries (AUX) it occurs with participles (VerbForm=Part
).
Degree and Polarity
- Degree applies to adjectives (ADJ) and adverbs (ADV) and has one of three possible values:
Pos
,Cmp
,Sup
. - Polarity is used to mark the negative adverbs ni, nih, niþ, nis, nibai, nei, i.e., only the
Neg
value is used.
Verbal Features
- Finite verbs always have one of three values of Mood:
Ind
,Imp
orOpt
. - Indicative verbs always have one of two values of Tense:
Past
,Pres
. - The feature Aspect is used with perfect participles, i.e., the only value is
Perf
. - There are two values of Voice:
Act
,Pass
.
Pronouns, Determiners, Quantifiers
- PronType is used with pronouns (PRON) and adverbs (ADV):
Prs
,Rcp
,Int
,Rel
. - The Poss feature marks possessive personal adjectives (e.g. meins “my”).
- The Reflex feature is always used together with
PronType=Prs
and it marks reflexive pronouns (sik) and reflexive possessive adjectives (seins). - Person is a lexical feature of personal pronouns (PRON) and has three values,
1
,2
and3
. With personal possessive adjectives (ADJ), the feature actually encodes the person of the possessor. Person is not marked on other types of pronouns and on nouns, although they can be almost always interpreted as the 3rd person.
Other Features
- There is one language-specific feature:
- Strength with two values,
Strong
andWeak
, is used with adjectives and participles to distinguish forms of the strong vs. weak declension. For example, masculine singular nom-gen-dat-acc strong blinds, blindis, blindamma, blindana “blind”; weak blinda, blindins, blindin, blindan “blind”.
- Strength with two values,
Syntax
Core Arguments, Oblique Arguments and Adjuncts
- Nominal subject (nsubj) is a noun phrase in the nominative case, without preposition.
- A subordinate clause may serve as the subject and is labeled
csubj
.
- A subordinate clause may serve as the subject and is labeled
- Nominal direct object (obj) is a noun phrase in the accusative case, without preposition.
- Nominal indirect object (iobj) is a noun phrase in the dative case, without preposition.
- Other nominal dependents of a predicate are labeled as oblique (obl).
- In passive clauses, the subject is labeled with nsubj:pass or csubj:pass, respectively.
- If the demoted agent is present, its relation is labeled obl:agent.
Relations Overview
- The following relation subtypes are used in Gothic:
- nsubj:pass for nominal subjects of passive verbs
- csubj:pass for clausal subjects of passive verbs
- obl:agent for agents of passive verbs
- flat:name for parts of a personal name
- advcl:cmp for adverbial clauses of comparison
Treebanks
There is 1 Gothic UD treebank: