UD for Manx
Tokenization and Word Segmentation
Generally speaking, tokens in Manx are delimited by whitespace characters and punctuation, with the following exceptions:
-
We diverge from the Irish and Scottish Gaelic treebanks in treating so-called “inflected prepositions” as multiword tokens. So for example, lhiam “with me” (Irish liom, Scottish Gaelic leam) is split into lesh “with” and mee ”me”.
- Some words containing apostrophes are treated as multiword tokens. Here are a few examples:
- ta’n = ta “is” + ‘n “the”
- t’eh = t’ “is” + eh ”he, it”
- v’ee = v’ “was” + ee ”she, it”
- shoh’n = shoh ”this” + ‘n “the”
- ‘sy = ayns “in” + yn “the”
- dt’inneen = dty ”your (s.)” + inneen ”daughter”
- But the apostrophe is also used in other
cases where we choose not to split as a multiword token:
- In some emphatic endings: e chree’s (“his heart”)
- When used word initially, it usually indicates an f dropped by lenition: toan yn ‘ockle “tone of the word” (cf. fockle “word”)
- In some orthographic variants: nee’m “I will do” (more often neeym), or bee’m “I will be” (more often beem or beeym), etc.
-
Hyphens are treated as internal word characters. This is the only reasonable choice in cases like neu-shickyr lit. “non-certain”, or h-awin (a mutated form of awin “river”). In other cases, especially noun-noun compounds like magher-etlee ”airfield” or shamyr-vrastyl “classroom”, one could argue that we ought to split into two words at the hyphen (indeed, compounds like this in Irish are written with a space instead of a hyphen). For simplicity’s sake we have not done so, since not all cases are clear cut.
-
Numbers and dates can contain internal punctuation, e.g. 12,000 or 9.7.96.
-
Some abbreviations containing periods are treated as single tokens, e.g. R.U. for Reeriaght Unnaneyssit ”United Kingdom”, or a.r.e. for as reddyn elley “and other things, etc.”
- There are no words containing spaces.
Morphology
Tags
-
Manx uses the full set of 17 UD part-of-speech tags.
-
The AUX tag is used only for the copula she. All other verbs, including the substantive verb bee “to be”, are tagged VERB.
-
The PART tag is used for the following words:
- The adverbalizer dy: dy moal “slowly”
- The negative verbal particles cha and nagh
- The comparative particle ny: ny syrjey “higher”
- The relativizer dy:
Strooys dy row aggle orroo \n Methinks that was fear on-them
mark(row, dy)
ccomp(Strooys, row)
nsubj(row, aggle)
-
Verbal nouns (jannoo, cur, etc.) are tagged NOUN and verbal adjectives (e.g. ruggit “born”) are tagged ADJ, following the Irish and Scottish Gaelic treebanks.
- demonstrative pronouns are tagged as PRON, e.g. shen va’n vea aym “that was my life”.
- demonstrative determiners are tagged as DET, e.g. yn lioar shen “that book”.
Features
The initial version of the Cadhan Aonair Manx treebank does not specify any morphological features, although we hope to add these to a future version.
Syntax
The basic word order of Manx is VSO, like the other Celtic languages:
Ren my ven yn soo shen jea \n Made my wife the jam this yesterday
det(ven, my)
nsubj(Ren, ven)
det(soo, yn)
obj(Ren, soo)
det(soo, shen)
advmod(Ren, jea)
The copula she is annotated as follows:
She yn Vritaan y lieh-innys smoo 'sy Rank \n It-is the Brittany the peninsula biggest in-the France
cop(Vritaan, She)
det(Vritaan, yn)
det(lieh-innys, y)
nsubj(Vritaan, lieh-innys)
amod(lieh-innys, smoo)
case(Rank, 'sy)
nmod(lieh-innys, Rank)
Verbal nouns play an important role in Manx grammar, and they are
annotated following the guidelines for Irish and Scottish Gaelic.
As noted above, they are always given the POS tag NOUN
and very often labeled as xcomp
of some higher verb:
T' ad faagail bee ec oaieyn ny merriu \n are they leaving food at graves the dead
nsubj(T', ad)
xcomp(T', faagail)
obj(faagail, bee)
case(oaieyn, ec)
obl(faagail, oaieyn)
det(merriu, ny)
nmod(oaieyn, merriu)
Note that the object follows the verbal noun in the case above; in other constructions it precedes the verbal noun:
Nee eh yn thie y lhieggal \n Will-do he the house to knock-down
nsubj(Nee, eh)
det(thie, yn)
obj(lhieggal, thie)
mark(lhieggal, y)
xcomp(Nee, lhieggal)
The substantive verb bee “to be” (Irish bí, Scottish Gaelic bi)
can have predicate complements in the form of adverbial, adjectival,
or prepositional phrases; these are distinct from copular
constructions in Manx. Following the Irish model,
we label these complements with the extended tag xcomp:pred
:
Ta 'n lioar ass clou \n is the book out-of print
det(lioar, 'n)
nsubj(Ta, lioar)
case(clou, ass)
xcomp:pred(Ta, clou)
The Manx treebank uses 31 of the 37
dependency relations in v2 of the UD guidelines
(all but expl
, dislocated
, aux
, clf
, list
, and goeswith
).
In addition, there are six subtype relations represented as well;
all but flat:foreign
are used in the Irish treebank.
acl:relcl
for relative clauses- case:voc for vocative particles
- csubj:cleft for cleft subjects
- csubj:cop for copular clausal subjects
flat:foreign
for non-first words in quoted foreign phrasesobl:tmod
for temporal modifiers- xcomp:pred for predicates of the substantive verb “to be”
Treebanks
There is one Manx UD treebank: