home edit page issue tracker

This page pertains to UD version 2.

UD for Irish

Tokenization and Word Segmentation

In Irish, in general, words are delimited by whitespace characters. Description of exceptions follows:

The Irish POS-tagger used in the Irish Dependency Treebank retains these as single tokens and so must be mapped accordingly as the treebanks develop concurrently.

Morphology

POS Tags

Open class words Closed class words Other
ADJ ADP PUNCT
ADV AUX SYM
INTJ CCONJ X
NOUN DET
PROPN NUM
VERB PART
PRON
SCONJ

The UD part-of-speech (POS) tagset is an extension of the The Google Universal POS tagset (Petrov et al., 2012) and contains 17 POS tags. The tags for the Irish Dependency Treebank is based on the PAROLE Morphosyntactic Tagset (ITÉ, 2002).

A mapping from this tagest to the UD tagset for use in the IUDT is given in: Lynn, Teresa and Jennifer Foster, Universal Dependencies for Irish In Proceedings of the 2nd Celtic Language Technology Workshop 2016, Paris, France.

The following is a summary of some specific/ unintuitive choices made to map Irish data conform to Universal POS tags for UDv2:


Features

Here we summarise the morphological features of Irish which can be categorised into inflectional and lexical features.

Lexical features
Abbr
Dialect
Foreign
NumType
PartType
Poss
PrepForm
PronType
Reflex
Inflectional features
Nominal Verbal
Case Mood
Definite Person
Degree Polarity
Gender Tense
Number VerbForm
Form Voice
NounType

Inflection in Irish mainly occurs through suffixation, but initial mutation through lenition and eclipsis is also common. Lenition is a phonological change that softens or weakens the articulation of a consonant. The eclipsis process renders voiced segments as nasalised and voiceless segments as being voiced (Stenson, 1981, p.18). A prominent feature of Irish which influences inflection, is the existence of two sets of consonants, referred to as “broad” and “slender” consonants. Consonants can be slenderised by accompanying the consonant with a slender vowel, either e or i. Broadening occurs through the use of broad vowels; a, o or u. In general, there needs to be vowel harmony (slender or broad) between stem endings and the initial vowel in a suffix.

VERBS

Verbs inflect for number and person, as well as mood and tense. Verbs can incorporate their subject, inflecting for person and number through suffixation. Such forms are referred to as synthetic verb forms. Most verbs tend to incorporate a subject when it is first person singular or plural. These synthetic forms are generally restricted to the Present Tense, Imperfect Tense, Conditional Mood and Imperative Mood.

However, second person singular and plural subjects are incorporated in some verb tenses and moods:

Tense is also marked by lenition on some verb forms:

Lenition occurs after the negative particle :

Eclipsis (initial mutation) occurs following clitics such as interrogative particles (an, nach); complementisers (go, nach); and relativisers (a, nach) (Stenson, 1981,pp. 21-26).

NOUNS

Modern Irish uses three cases: Nominative, Genitive and Vocative. The nominative form is sometimes regarded as the “common form” as it is now also used for accusative and dative forms (See Case for a description of ‘Case=NomAcc’). Nouns in Irish are divided into five classes, or declensions, depending on the manner in which the genitive case is formed. In addition, there are two grammatical genders in Irish - masculine and feminine. Case, declension and gender are expressed through noun inflection. For example, páipéar “paper” is a masculine noun in the first declension. Both lenition and slenderisation are used to form the genitive singular form: pháipéir.

In addition, possessive determiners cause nominal inflection through lenition, eclipsis and prefixation.

Adjectives

In general, adjectives follow nouns and agree in number, gender and case. Depending on the noun they modify, adjectives can also inflect. The Christian Brothers (1988, p.63) note eight main declensions of adjectives. They can decline for genitive singular masculine, genitive singular feminine and nominative plural.

Comparative adjectives are also formed through inflection:

Prepositions

Irish has simple prepositions (e.g. ar “on”) and compound prepositions (e.g. in aghaidh “against”). Most of the simple prepositions can inflect for a pronominal object that indicates person and number (known as prepositional pronouns or pronominal prepositions), thus including a nominal element. Compare le and leis:

These forms are used quite frequently, not only with regular prepositional attachment where pronominal prepositions operate as arguments of verbs or modifiers of nouns and verbs, but also in idiomatic use where they express emotions and states.


Syntax


Nominals
Clauses
Modifier words
Function Words
Core arguments
nsubj
nsubj:pass
csubj:pass
obj
iobj
csubj
csubj:cleft
csubj:cop
ccomp
xcomp
xcomp:pred
Non-core dependents
obl
obl:tmod
obl:prep
vocative
expl
dislocated
advcl
advmod
discourse
aux
aux:pass
cop
mark
Nominal dependents
nmod
nmod:poss
appos
nummod
acl
acl:relcl
amod
det
case
Coordination
MWE
Loose
Special
Other
conj
cc
fixed
flat
flat:foreign
flat:name
compound
compound:prt
list
parataxis
orphan
goeswith
reparandum
punct
root
dep

Here we summarise some of the distinctive features of Irish as a Celtic language. These features commonly occur in standard Irish use and therefore require discussion in the context of treebank development. Irish theoretical syntax is relatively under-researched, yet this summary shows that even within the limited work carried out in this area thus far, there still remain many unresolved disagreements as we show here. In general, Irish dependency treebank development follows the work of Stenson (1981).

VSO clause structure

Both main clauses and subordinate clauses follow a VSO structure in Irish.

There are only a couple of exceptional circumstances under which an element can appear between the verb and the subject (see example below) and while various elements may occur between the subject and object, such as prepositional phrases and adverbs, the verb-subject-object order is strict (Mc-Closkey, 1983, pp. 10-11).

Irish sentences using , the Substantive Verb “to be” follow the VSO structure. However, copular constructions using the Copula is follow a Copula-Predicate-Subject order. This is explained in more detail in cop.

Core Arguments, Oblique Arguments and Adjuncts

A nominal subject (nsubj) is a noun phrase in the nominative case, without preposition.

An infinitive verb may serve as the subject and is labeled as clausal subject, ‘csubj’. On the other hand, verbal nouns as subjects are just nsubj.

A finite subordinate clause may serve as the subject and is labeled ‘csubj:cop’.

‘csubj:cop’ is used when the clause is a subject of a copular phrase. These are copular constructions that follow the Copula-Predicate-Subject order.

On the other hand, ‘csubj:cleft’ is used when the clause is the subject of a clefted sentence (which also follow the Copula-Predicate-Subject order).

There are idiomatic phrases in which translations would suggest that the Irish subject is actually the object.

For example:

There is no passive construction in Irish, and therefore ‘nsubj:pass’ or ‘csubj:pass’ are not used in the Irish treebank. What often translates into English as passive is the automonous verb form. These verbs (labelled with the feature ‘Voice=Auto’ (See Voice) have an “understood”/implicit subject and are usually followed directly by the object.

Objects ‘obj’ in Irish may be bare noun phrases in common form (NomAcc)or prepositional phrases in common form (NomAcc). For the purpose of UD the objects are divided to core objects, labeled obj and oblique objects, labeled obl.

There are no indirect objects in Irish.

Oblique ‘obl’. Adjuncts are usually prepositional phrases, but they can be bare noun phrases as well. They are labeled obl: * Foilsíodh an chéad chuid den sraith cartún sa bhliain 1983 “The first cartoon series was published in the year 1983”

The dative alternation where the prepositional construction gets a similar analysis to the double object construction

Nouns can be objects of clausal complements, which are labeled xcomp.

If a verb subcategorizes for two core objects, one of them accusative (or ccomp) and the other non-accusative, then the non-accusative object is labeled iobj. Core nominal objects in other situations are labeled just obj.

Oblique agents of verbal adjectives are labelled as ‘obl’

All prepositional phrases that are not prepositional objects (i.e., their role and form is not defined lexically by the predicate) are adjuncts (‘nmod’).

Clefting / Fronting

Clefting or fronting is a commonly used structure in the Irish language and described in more detail in csubj:cleft. Elements are fronted to predicate position to create emphasis or focus. Irish clefts differ from clefts in English in that there is more freedom with regards to the type of sentence element that can be fronted (Stenson, 1981, p.99). In Irish, the structure is as follows: Copula, followed by the fronted element (Predicate), followed by the rest of the sentence (Relative Clause). The predicate can take the form of a noun phrase (headed by pronoun, noun, verbal noun), or adjectival, prepositional or adverbial phrases.

Adverbial Fronting:

Pronoun Fronting:

Prepositional fronting:

Note that in UD, the cleft particle a is indistinguishable from the relative particle a. Both are labelled ‘mark:prt’ (see (mark:prt]().

Stenson (1981, p.111) describes the cleft construction as being similar to copular identity structures with the order of elements as Copula, Predicate, Subject. According to Stenson, the a is a relative particle which forms part of the relative clause. However, there is no surface head noun in the relative clause { it is missing an NP. Stenson refers to these structures as having an “understood” nominal head such as an rud “the thing” or an té “the person/the one”, e.g. Is ise [an té] a chonaic siad inné. When the nominal head is present, it becomes a copular identity construction: She is the one who they saw yesterday. In the absence of a head noun, the verb is labelled as the head of the clause.

Language specific labels

The Irish UD treebank uses 26 of the UD dependency labels. A further 10 language specific labels were introduced to deal with certain linguistic phenomena in Irish:

References

Christian Brothers, 1988. New Irish Grammar, Dublin: C J Fallon

Lynn, Teresa , Ozlem Cetinoglu, Jennifer Foster, Elaine Uí Dhonnchadha, Mark Dras and Josef van Genabith, [Irish Treebanking and Parsing: A Preliminary Evaluation] (http://www.lrec-conf.org/proceedings/lrec2012/pdf/378_Paper.pdf), LREC 2012, Istanbul, May 2012

Lynn, Teresa, Jennifer Foster, Mark Dras and Elaine Uí Dhonnchadha, [Active Learning and the Irish Treebank] (http://www.alta.asn.au/events/alta2012/proceedings/pdf/U12-1005.pdf), ALTA 2012, Dunedin, NZ, December 2012

Lynn, Teresa, Jennifer Foster, Mark Dras and Josef van Genabith, [Working with a small dataset — semi-supervised dependency parsing for Irish] (http://www.nclt.dcu.ie/~tlynn/spmrl.pdf), SPMRL 2013, Seattle, USA, October 2013

Lynn, Teresa, Jennifer Foster, Mark Dras and Lamia Tounsi, [Cross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study] (http://www.nclt.dcu.ie/~tlynn/CLTW.pdf) CLTW 2014, Dublin, Ireland, August 2014

Teresa Lynn, [Irish Dependency Treebanking and Parsing] (http://www.nclt.dcu.ie/~tlynn/Teresa_PhDThesis_final.pdf), PhD Thesis, Dublin City University, Ireland and Macquarie University, Sydney, Australia, 2016

Lynn, Teresa and Jennifer Foster, [Universal Dependencies for Irish] (http://www.nclt.dcu.ie/~tlynn/Lynn_CLTW2016.pdf), CLTW 2016, Paris, France, July 2016

Stenson, N, 1981. Studies in Irish Syntax, Tübingen: Gunter Narr Verlag.

The Christian Brothers, New Irish Grammar, Dublin, Ireland: C.J. Fallon, March 1994

Uí Dhonnchadha, E. 2002. An Analyser and Generator for Irish Inflectional Morphology using Finite State Transducers, School of Computing, Dublin City University: Unpublished MSc Thesis.

Uí Dhonnchadha, E. 2009. Part-of-Speech Tagging and Partial Parsing for Irish using Finite-State Transducers and Constraint Grammar (PhD thesis)


Treebanks

There is one Irish UD treebank: