UD Old Irish DipWBG
Language: Old Irish (code: sga
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.12 release.
The following people have contributed to making this treebank part of UD: Adrian Doyle.
Repository: UD_Old_Irish-DipWBG
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: bible, grammar-examples, nonfiction
Questions, comments? General annotation questions (either Old Irish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [adrianodughaill (æt) gmail • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
A Universal Dependencies treebank for the Old Irish Würzburg glosses.
The Diplomatic Würzburg Glosses Treebank (DipWBG) has been compiled as part of a PhD research project by Adrian Doyle at the National University of Ireland, Galway. These glosses were written during the 8th century in Latin and Old Irish. Only those glosses which contain some Irish text are collected here, however, many of these include code-mixing between Irish and Latin. These glosses relate to the Pauline Epistles and commentaries thereon.
The Old Irish text used in this treebank has been drawn from the Würzburg Irish Glosses website (www.wurzburg.ie). The treebank currently contains 42 glosses, though more will be added in future updates as annotation continues. Because of the rarity of Old Irish text surviving in manuscripts from the period, and because annotation has been completed for only a handful of these to date, only a test set is yet available.
Acknowledgments
This research has been supported by NUIG through the Digital Arts and Humanities scholarship, as well as by the Irish Research Council.
References
- Doyle, Adrian, Würzburg Irish Glosses Würzburg Irish Glosses (2018), www.wuerzburg.ie [accessed 14 March 2023]
- Doyle, Adrian, John Philip McCray and Clodagh Downey. A Character-Level LSTM Network Model for Tokenizing the Old Irish text of the Würzburg Glosses on the Pauline Epistles. CLTW 2019, Dublin, Ireland, August 2019. (https://www.aclweb.org/anthology/W19-6910/)
- Kavanagh, Séamus, and Dagmar S. Wodtko. (Eds.). (2001). A Lexicon of the Old Irish Glosses in the Würzburg Manuscript of the Epistles of St. Paul. Mitteilungen der Prähistorischen Kommission 45, Vienna: Verlag der Österreichischen Akademie der Wissenschaften.
- McCone, Kim. (1997). The Early Irish Verb - Second Edition Revised with Index. An Sagart, Maynooth.
- Stifter, David. (2006). Sengoidelc. Syracuse University Press, New York.
- Stokes, Whitley, and John Strachan (eds.). (1902). Thesaurus Palaeohibernicus Vol. II. Cambridge University Press.
- Thurneysen, Rudolf. (1946). A Grammar of Old Irish. Binchy, D. A. and Bergin, Osborn (tr.), Reprinted 2010, Dublin Institute for Advanced Studies.
Statistics of UD Old Irish DipWBG
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – PART – PRON – PROPN – SCONJ – VERB
Features
Abbr – AdpType – Aspect – Case – Definite – Degree – Foreign – Gender – Mood – Number – PartType – Person – Polarity – Poss – Prefix – PronClass – PronType – Tense – VerbType – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – case – cc – ccomp – compound:prt – conj – cop – csubj – det – dislocated – mark – mark:int – nmod – nmod:poss – nsubj – nsubj:outer – obj – obj:infx – obl – obl:prep – parataxis – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 34 sentences and 438 tokens.
- This corpus contains 168 tokens (38%) that are not followed by a space.
- This corpus contains 2 types of words with spaces. Examples: ṁ bed, ṅ dǽ
- This corpus contains 1 types of words that contain both letters and punctuation. Examples: .i.
Morphology
Tags
- This corpus uses 12 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, PART, PRON, PROPN, SCONJ, VERB
- This corpus does not use the following tags: NUM, INTJ, SYM, PUNCT, X
- This corpus contains 22 word types tagged as particles (PART): a, as, at, con, d, do, eter, hí, in, int, nas, ni, nno, no, ní, r, ro, se, sin, siu, so, to
- This corpus contains 26 lemmas tagged as pronouns (PRON): a, a_2, ar, b, cani, cote, cách, do, dob, ed, far, id_1, m, mo, n, ni, ní, s_2, sa, si, side, so, som, t_1, tú, é_1
- This corpus contains 2 lemmas tagged as determiners (DET): a, in
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: a
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): is
- This corpus does not use the VerbForm feature.
Nominal Features
- Fem
- ADP: tree
- DET: inna
- Masc
- ADJ: adthramli, diadi, macthi
- DET: in, ind, na
- PRON: hé, d
- Masc,Neut
- PRON: a, som
- Neut
- ADJ: chotarsne, domunde, essamin, foirbthi, gáitha, inse, mílsi, suaignid, thoirsech, trom
- DET: a, inna
- PRON: hed, a, són
- Plur
- ADJ: adthramli, diadi, domunde, foirbthi, gáitha, macthi, mílsi, uili, æcni
- ADP: dúib, frib, fuirib, indib, lib, linn
- AUX: bimmi, bed, nirubtar
- DET: inna, na
- NOUN: belre, biada, béssu, comairli, dánu, fochidi, gnímu, mban, soscéli, tol
- PRON: si, for, ar, far, ni, b, n, ndob, s
- PROPN: maccidóndu
- VERB: amlid, chretsit, fulsam, gaibid, gessam, gigeste, nducaid, nropridchissem, riat, rocomalnisid
- Sing
- ADJ: chotarsne, domunde, dían, essamin, frecṅdircc, inse, loingthech, maith, suaignid, thoirsech
- ADP: de, dét, lim, limm, tree, uáit, ṅduit
- AUX: is, d, ni, ba, p, am, as, bes, i, naba
- DET: a, in, ind
- NOUN: cenn, nem, precept, precepte, airli, carcair, chenél, chomalnad, chorp, chuimriug
- PRON: sa, mo, a, m, se, t, hed, hé, cáich, d
- PROPN: abracham, crist, moysi
- VERB: tá, ail, anicc, arim, beir, beo, bera, bered, berinn, bia
- Acc
- ADJ: diadi, domunde, macthi, mílsi
- DET: inna, a, na
- NOUN: cenn, biada, béssu, chomalnad, chumang, comarbus, dia, dánu, etargne, etiuth
- PRON: són
- PROPN: maccidóndu
- Dat
- ADJ: frecṅdircc
- NOUN: precept, carcair, chorp, chuimriug, eícndarcus, formut, irbáig, nem, nícc, nóinur
- Gen
- ADJ: domunde
- DET: ind, inna
- NOUN: precepte, soscéli, belre, firinne, hirisse, mban, nanme, nathar, sosceli, tol
- PRON: cáich
- PROPN: crist
- Nom
- ADJ: chotarsne, adthramli, dían, essamin, foirbthi, gáitha, inse, loingthech, maith, suaignid
- DET: in, a
- NOUN: airli, chenél, comairli, fochricc, foirbthetu, labrad, machthad, temel, thorbe, threte
- PROPN: abracham, moysi
- Ind
- ADP: ar, dúib, oc, dar, i, de, di, fri, as, dochum
Degree and Polarity
- Cmp
- ADJ: lia
- Pos
- ADJ: chotarsne, domunde, adthramli, diadi, dían, essamin, foirbthi, frecṅdircc, gáitha, il
- Neg
- ADV: nacc
- AUX: ni, naba, nirubtar, ní
- PART: ni, ní
- PRON: cain
- SCONJ: na, ná, nád
- Pos
- AUX: is, d, ba, bimmi, p, am, as, bed, bes, i
Verbal Features
- Hab
- VERB: biuu
- Imp
- VERB: bered, carad, gníthe
- Perf
- AUX: nirubtar
- VERB: nropridchissem, rocomalnisid, ronóibad, rreractid, rucca, rérachtid, árbas, érbarthar
- Imp
- AUX: bed, naba
- VERB: amlid, gaibid, léic, mil
- Ind
- AUX: is, ni, bimmi, am, as, i, nda, nirubtar, ní
- VERB: tá, ail, anicc, beir, bered, bia, biur, biuu, carad, chechladar
- Sub
- AUX: d, ba, p, bes, ropad
- VERB: arim, beo, bera, berinn, certa, fulsam, gessam, gessir, labrar, nducaid
- Fut
- AUX: bimmi
- VERB: bia, chechladar, creitfess, gigeste, ririu
- Past
- AUX: nirubtar, ropad
- VERB: anicc, bered, berinn, carad, chretsit, gníthe, nropridchissem, rocomalnisid, ronóibad, rreractid
- Pres
- AUX: is, d, ni, ba, p, am, as, bes, i, nda
- VERB: tá, ail, arim, beir, beo, bera, biur, biuu, certa, denim
- Act
- VERB: tá, ail, amlid, anicc, arim, beir, beo, bera, bered, berinn
- Pass
- VERB: etar, gníthe, ronóibad, árbas, érbarthar
Pronouns, Determiners, Quantifiers
- Ana
- PRON: són
- Art
- DET: inna, a, in, ind, na
- Dem
- PART: so, a, se, sin, siu
- Emp
- PRON: sa, si, se, ni, so, som
- Ind
- PRON: cách, níi, cáich
- Int
- PART: in
- PRON: cote, cain
- Prs
- ADP: dúib, de, dét, frib, fuirib, indib, lib, lim, limm, linn
- PRON: mo, a, for, m, ar, t, far, hed, hé, b
- Rel
- AUX: as, bes
- Yes
- PRON: mo, for, a, ar, far, do, m, mm
- 1
- ADP: lim, limm, linn
- AUX: bimmi, am, nda
- PRON: sa, mo, m, ar, se, ni, mm, n
- VERB: arim, beo, berinn, biur, biuu, denim, fulsam, gessam, gníu, guidimm
- 2
- ADP: dúib, dét, frib, fuirib, indib, lib, uáit, ṅduit
- AUX: ba, bed, naba
- PRON: si, for, t, far, b, do, ndob, so, tú
- VERB: amlid, gaibid, gigeste, léic, mil, nducaid, ngeiss, rocomalnisid, rreractid, rérachtid
- 3
- ADP: de, tree
- AUX: is, d, ni, p, as, ba, bes, i, nirubtar, ní
- PRON: a, hed, hé, d, s, som
- VERB: tá, ail, anicc, beir, bera, bered, bia, carad, certa, chechladar
Other Features
- Abbr
- Yes
- ADV: .i.
- Yes
- AdpType
- Prep
- ADP: ar, dúib, oc, dar, i, de, di, fri, as, dochum
- Prep
- Foreign
- Yes
- CCONJ: et
- NOUN: gloria, legis
- Yes
- PartType
- Dct
- PART: hí, ni
- Rel
- PART: a
- Vb
- PART: no, do, as, nas, a, at, con, d, eter, int
- Dct
- Prefix
- Yes
- ADJ: il
- Yes
- PronClass
- A
- PRON: m, t, a, b, n, s
- C
- PRON: d, ndob
- A
- VerbType
- Cop
- AUX: is, d, ni, ba, bimmi, p, am, as, bed, bes
- Cop
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: is.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN-Acc (1)
- VERB--NOUN-Nom (5)
- VERB--PRON (1)
- obj
- VERB--NOUN (1)
- VERB--NOUN-Acc (13)
- VERB--PRON (2)
Relations Overview
- This corpus uses 7 relation subtypes: acl:relcl, compound:prt, mark:int, nmod:poss, nsubj:outer, obj:infx, obl:prep
- The following 1 main types are not used alone, they are always subtyped: compound
- The following 16 relation types are not used in this corpus at all: iobj, vocative, expl, discourse, aux, appos, nummod, clf, fixed, flat, list, orphan, goeswith, reparandum, punct, dep