UD for Umbrian
Introduction
Umbrian is an Indo-European language of the Italic branch. As such it shares a number of characteristics with classical IE languages and especially with Latin. The main similarities between Umbrian and Latin are their declension and conjugation systems. The main difference, beside phonology, is the extendisve use of cliticised postpositions in Umbrian where Latin has plain prepositions.
Tokenization and Word Segmentation
The Iguvine tablets use a word separator to (: in the Umbrian script and ⋅ in the Latin script). We thus follow native word segmentation as much as possible. The main exceptions are :
- when the original segmentation itself is erroneous (e.g. pesni:mu for pesnimu on tablet II face b);
- cliticised adpositions that we decided to separate from their host (e.g. the ubiquitous tutaper is analysed as tuta per).
Morphology
Tags
PUNCT
is not used in Umbrian (note that there are word boundaries in the original text but no sentence boundaries).
Features
NOUN
,PRON
,PROPN
,ADJ
, andDET
are marked withCase
andNumber
, andGender
when it is known.VERB
is marked withVerbForm
, andTense
,Mood
,Person
,Number
orCase
,Gender
,Number
depending on the finiteness of the form. ** Note that verbs have a future perfect form which comes from a very reduced periphrastic construction. Until we find a better solution, we decided to useAspect=Per
in conjunction withTense=Fut
for these cases.
Syntax
- Core arguments are identified with case (
Nom
andAcc
) and in absence of case triggering adposition (rupinam-e isobl
even if rupinam isAcc
because of the adposition e).
Treebanks
There are 1 Umbrian UD treebanks: