UD Old Irish DipSGG
Language: Old Irish (code: sga
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.12 release.
The following people have contributed to making this treebank part of UD: Adrian Doyle.
Repository: UD_Old_Irish-DipSGG
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-NC-SA 4.0
Genre: academic, grammar-examples, nonfiction, poetry
Questions, comments? General annotation questions (either Old Irish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [adrianodughaill (æt) gmail • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
A Universal Dependencies treebank for the Old Irish glosses of St. Gall.
The Diplomatic St. Gall Glosses Treebank (DipSGG) has been compiled as part of a PhD research project by Adrian Doyle at the National University of Ireland, Galway. These glosses were written about the middle of the 9th century in Latin and Old Irish. Only those glosses which contain some Irish text are collected here, however, many of these include code-mixing between Irish and Latin. The subject of these glosses is the Latin Grammar of Priscianus Caesariensis.
The Old Irish text has been drawn from Bernhard Bauer’s work on these glosses which was carried out originally for his project, A Dictionary of the Old Irish Priscian Glosses, in 2015, and supplemented with Latin text from Rijcklof Hofman’s The Sankt Gall Priscian Commentary (1996). The POS tags have, for the most part, been converted from Bauer’s morphological analysis of this Irish text. Both of these resources are available through the St Gall Glosses Database (www.stgallpriscian.ie), produced and hosted by Pádraic Moran.
The conversion from Bauer’s annotation scheme to the UD annotation scheme was carried out by Adrian Doyle, during which time some revisions were made to the analysis, manuscript reading, and translation of some glosses. Glosses in the Ogam script were produced specifically for this treebank by Doyle as all earlier editions render them transliterated into the Roman alphabet.
The collection, in total, contains 3,471 glosses, though many of these have not yet been annotated with dependency information. Because of the rarity of Old Irish text surviving in manuscripts from the period, and because annotation has been completed on only a handful of these to date, only a test set is yet available.
Acknowledgments
I wish to thank Pádraic Moran for making the contents of the St. Gall Glosses database available to me for this project, as well as Bernhard Bauer and Rijcklof Hofman for allowing their work to be altered and reproduced in this manner.
This research has been supported by NUIG through the Digital Arts and Humanities scholarship, as well as by the Irish Research Council.
References
Bauer, Bernhard, Rijcklof Hofman, Pádraic Moran. St Gall Priscian Glosses, version 2.0 (2017) (www.stgallpriscian.ie)
Bauer, Bernhard. (2015). A dictionary of the Old Irish Priscian Glosses. (http://www.univie.ac.at/indogermanistik/priscian/)
Doyle, Adrian, John Philip McCray and Clodagh Downey. A Character-Level LSTM Network Model for Tokenizing the Old Irish text of the Würzburg Glosses on the Pauline Epistles. CLTW 2019, Dublin, Ireland, August 2019. (https://www.aclweb.org/anthology/W19-6910/)
McCone, Kim. (1997). The Early Irish Verb - Second Edition Revised with Index. An Sagart, Maynooth.
Stifter, David. (2006). Sengoidelc. Syracuse University Press, New York.
Stokes, Whitley, and John Strachan (eds.). (1902). Thesaurus Palaeohibernicus Vol. II. Cambridge University Press.
Thurneysen, Rudolf. (1946). A Grammar of Old Irish. Binchy, D. A. and Bergin, Osborn (tr.), Reprinted 2010, Dublin Institute for Advanced Studies.
Statistics of UD Old Irish DipSGG
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Abbr – AdpType – Aspect – Case – Definite – Degree – Foreign – Gender – Mood – Number – NumType – PartType – Person – Polarity – Poss – Prefix – PronClass – PronType – Tense – Typo – VerbType – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – case – case:voc – cc – ccomp – compound:prt – conj – cop – det – discourse – dislocated – flat – flat:foreign – mark – nmod – nmod:poss – nmod:pre – nsubj – nummod – obj – obj:infx – obl – obl:agent – obl:prep – obl:tmod – parataxis – punct – root – vocative
Tokenization and Word Segmentation
- This corpus contains 64 sentences and 418 tokens.
- This corpus contains 114 tokens (27%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 3 types of words that contain both letters and punctuation. Examples: .i., .c, .d.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 18 word types tagged as particles (PART): Do, a, at, fo, fu, hí, in, nad, ni, no, ní, ro, sa, sin, so, th, ǽr, ᚄᚑ
- This corpus contains 11 lemmas tagged as pronouns (PRON): a, do, m, mo, mé, sa, so, som, sí, t_1, tú
- This corpus contains 4 lemmas tagged as determiners (DET): a, cach, in, nach
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: a
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): is
- This corpus does not use the VerbForm feature.
Nominal Features
- Fem
- ADJ: acher, bec, mar, mass
- DET: in, ind, inna, nd, sin, ᚔᚅ, ṅ
- PRON: sí
- Masc
- ADJ: gann, glass, línech, naue
- DET: in, inna, ind, inḍt, naib
- Masc,Neut
- ADP: and, de, foir, oco
- PRON: a, som, ᚐ
- Neut
- ADJ: haill, minn, tana
- DET: a, in, naib
- SCONJ: nach
- Dual
- DET: ṅ
- NOUN: rainn
- Plur
- ADJ: libardaib
- ADP: dv́n, friu
- DET: inna, naib
- NOUN: bachal, comroicniu, dindgnaib, doss, déainmmnichdechaib, fidbaidae, grec, laitnori, ṅén, ṡianach
- PRON: a
- VERB: ecmoṅgat, seichetar
- Sing
- ADJ: gann, acher, bec, cáin, dorchæ, glass, haill, lainn, línech, mall
- ADP: dom, and, dait, de, foir, frimm, lat, oco
- AUX: is, d, ní, bid, bith, mba
- DET: in, a, ind, cach, inna, inḍt, na, nd, sin, ᚔᚅ
- NOUN: ᚉᚑᚉᚐᚏᚈ, dia, dias, ingen, ainm, airdircus, aite, aithne, anmmain, bendacht
- PRON: m, a, mo, do, mei, mm, siu, som, sse, sv
- PROPN: brigtae, choirbbre, dongus, donngvs, ferguso, finguine, lothlind, maddoc, mail, máel
- SCONJ: nach
- VERB: chain, Gaib, braigim, cél, cóima, epur, farcai, fuasna, giuil, llega
- Acc
- ADJ: haill
- DET: in, na, ṅ
- NOUN: chluim, chuil, colcaid, comroicniu, doidṅgi, dul, hési, laitnori, ndead, rainn
- PROPN: máel
- Dat
- ADJ: glass, lainn, libardaib
- DET: cach, in, naib, nd, sin
- NOUN: dia, anmmain, buith, ceniul, charcair, coimthecht, comṡuidiguth, dindgnaib, déainmmnichdechaib, inis
- PROPN: lothlind, maddoc
- Gen
- ADJ: minn
- DET: inna, ind
- NOUN: bachal, chasc, con, cói, denmo, dodcaid, doss, ecni, fairggae, fidbaidae
- PROPN: brigtae, ferguso, patric, ᚋᚐᚏᚈᚐᚔᚅ
- Nom
- ADJ: gann, acher, bec, cáin, dorchæ, línech, mall, mar, mass, mmall
- DET: in, a, ind, inḍt, ᚔᚅ
- NOUN: ᚉᚑᚉᚐᚏᚈ, dias, ainm, airdircus, aite, aithne, bendacht, bruach, cenéle, chliab
- PROPN: choirbbre, dongus, donngvs, finguine, ruadri
- Voc
- NOUN: ingen
- PROPN: mail, máelecán
- Def
- ADP: do, ar, di, i
- DET: cach
- Ind
- ADP: hi, do, dom, ar, de, huas, i, and, dait, di
- DET: na
Degree and Polarity
- Cmp
- ADJ: má
- Pos
- ADJ: ferr, gann, mar, nóib, acher, bec, cáin, dorchæ, droch, find
- Neg
- AUX: ní
- CCONJ: na
- PART: ni, ní, nad
- SCONJ: na, nach
- Pos
- AUX: is, d, bid, bith, mba
Verbal Features
- Hab
- VERB: ṁbís
- Imp
- VERB: Gaib
- Ind
- AUX: is, ní, bith
- VERB: chain, braigim, cél, ecmoṅgat, epur, farcai, fuasna, giuil, llega, maraith
- Sub
- AUX: d, bid, mba
- VERB: cóima, roib, samlar
- Fut
- AUX: bith
- VERB: cél, róis, tiach
- Past
- AUX: bid
- VERB: giuil, roscribad, roscríbad, rosechestar
- Pres
- AUX: is, d, ní, mba
- VERB: chain, Gaib, braigim, cóima, ecmoṅgat, epur, farcai, fuasna, llega, maraith
- Act
- VERB: chain, Gaib, braigim, cél, cóima, ecmoṅgat, epur, farcai, fuasna, giuil
- Pass
- VERB: roscribad, roscríbad
Pronouns, Determiners, Quantifiers
- Art
- DET: in, a, ind, inna, naib, inḍt, nd, sin, ᚔᚅ, ṅ
- Dem
- PART: sin, so, sa, ᚄᚑ
- Emp
- PRON: siu, som, sse, sv
- Prs
- ADP: dom, and, dait, de, dv́n, foir, frimm, friu, lat, oco
- PRON: a, m, mo, do, mei, mm, sí, t, thv, ᚐ
- SCONJ: nach
- Rel
- AUX: mba
- Card
- NUM: di
- Ord
- NUM: tris
- Yes
- PRON: a, mo, do, ᚐ
- 1
- ADP: dom, dv́n, frimm
- PRON: m, mo, mei, mm, sse
- VERB: braigim, cél, epur, samlar, scríbaimm, tiach, ágor
- 2
- ADP: dait, lat
- PRON: do, siu, sv, t, thv
- VERB: Gaib, róis
- 3
- ADP: and, de, foir, friu, oco
- AUX: is, d, ní, bid, bith, mba
- PRON: a, som, sí, ᚐ
- SCONJ: nach
- VERB: chain, cóima, ecmoṅgat, farcai, fuasna, giuil, llega, maraith, mardda, roib
Other Features
- Abbr
- Yes
- ADV: .i.
- Yes
- AdpType
- Prep
- ADP: do, hi, ar, dom, i, de, di, huas, and, dait
- Prep
- Foreign
- Yes
- ADJ: displosa
- ADV: amen, nam, quantum
- CCONJ: et, ⁊
- NOUN: animalis, femininum, nomen, pedo, sona, vesíca, ᚃᚓᚏᚔᚐ, ᚆᚑᚇᚔᚓ, accentus
- PROPN: isidorus
- SCONJ: ut
- VERB: adest, dicit, fit, pepedi
- X: .c, .d.
- Yes
- PartType
- Aug
- PART: ro
- VERB: roib, roscribad, roscríbad, rosechestar
- Dct
- PART: hí, ní
- Rel
- PART: nad
- Vb
- PART: fo, Do, a, at, fu, in, no, th
- Voc
- PART: a
- Aug
- Prefix
- Yes
- ADJ: nóib, droch, find, mar, menn, ᚋᚔᚅ
- NOUN: sam
- PART: ǽr
- Yes
- PronClass
- A
- PRON: m, mm, t
- A
- Typo
- Yes
- DET: inḍt
- NOUN: ᚉᚑᚉᚐᚏᚈ, accentus, ᚙᚑᚉᚐᚏᚈ
- Yes
- VerbType
- Cop
- AUX: is, d, ní, bid, bith, mba
- SCONJ: nach
- Cop
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: is.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (2)
- VERB--NOUN-Nom (9)
- obj
- VERB--NOUN (1)
- VERB--NOUN-Acc (5)
- VERB--NOUN-Nom (3)
Relations Overview
- This corpus uses 10 relation subtypes: acl:relcl, case:voc, compound:prt, flat:foreign, nmod:poss, nmod:pre, obj:infx, obl:agent, obl:prep, obl:tmod
- The following 1 main types are not used alone, they are always subtyped: compound
- The following 13 relation types are not used in this corpus at all: iobj, csubj, xcomp, expl, aux, appos, clf, fixed, list, orphan, goeswith, reparandum, dep