UD Old English Cairo
Language: Old English (code: ang)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.16 release.
The following people have contributed to making this treebank part of UD: Lauren Levine, Junghyun Min, Amir Zeldes.
Repository: UD_Old_English-Cairo
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Old English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [amir • zeldes (æt) georgetown • edu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | annotated manually |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
Old English Cairo sentences with UD and additional annotations
Old English Cairo sentences with UD and additional annotations relevant to its historicity like hyperlemma, Indo-European root, and gloss.
Acknowledgments
We would like to thank the students in the course Corpus Approaches to Historical Linguistics at Georgetown University for participating in the course and the creation of the dataset:
Abdullah Alasman, Cynthia Li, Dan DeGenaro, Devika Tiwari, Eamon Maloney, Elli Ahn, Erin Tirpak, Junghyun Min, Kate Whipple, Lauren Levine, Nola Goodwin, Wesley Scivetti, Wyatt Roder
… and other annotators who wish to remain anonymous.
References
Please refer to the following article for more information on the dataset and its creation, or for citation purposes.
@misc{levine2025building,
title={Building UD Cairo for Old English in the Classroom},
author={Lauren Levine and Junghyun Min and Amir Zeldes},
year={2025},
eprint={2504.18718},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.18718},
}
Statistics of UD Old English Cairo
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Case – Degree – ExtPos – Gender – Mood – Number – Person – Tense – VerbForm
Relations
acl – advcl – advmod – amod – appos – aux – aux:pass – case – cc – cc:preconj – ccomp – conj – cop – csubj – det – expl – fixed – flat – iobj – mark – nmod:poss – nsubj – nsubj:pass – obj – obl – obl:unmarked – orphan – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 20 sentences and 171 tokens.
- This corpus contains 25 tokens (15%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 8 types of words that contain both letters and punctuation. Examples: ærend-ƿrit, Franc-landes, be-tƿeonen, efen-ȝemacca, heafod-burh, hƿeol-bearƿe, neah-ȝebur, ȝierstan-dæg
Morphology
Tags
- This corpus uses 13 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: NUM, INTJ, SYM, X
- This corpus contains 1 word types tagged as particles (PART): to
- This corpus contains 13 lemmas tagged as pronouns (PRON): he, heo, hie, hit, hƿa, hƿæt, ic, min, se, seo, sƿa, þu, þīn
- This corpus contains 4 lemmas tagged as determiners (DET): _, se, sum, þis
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: se
- This corpus contains 6 lemmas tagged as auxiliaries (AUX): beon, habban, magan, nyllan, ƿesan, ƿillan
- There are 3 (de)verbal forms:
- Fin
- AUX: is, hæfð, meaht, mihton, nolde, ƿæs, Ƿilt
- VERB: dyde, ƿrat, beclypton, bohte, forletone, locast, nabbaþ, ontyne, rann, reȝnaþ
- Inf
- AUX: beon
- VERB: cuman, don, drincan, smican, ȝesƿicanne
- Part
- VERB: ȝebroht, onȝemet, ȝecorene
Nominal Features
- Fem
- DET: sume, þære
- NOUN: heafod-burh, heoþe, ƿisan
- PRON: heo, hyre, hire, seo
- PROPN: Maria, Brun, Iohanna
- Masc
- ADJ: lytel, micel, readne
- DET: nanne, þone
- NOUN: broþor, efen-ȝemacca, eodor, freonde, fæder, hƿeol-bearƿe, neah-ȝebur, rice, ȝeþoht, ȝierstan-dæg
- PRON: he, his
- PROPN: Petrus, Iguazu, Petere, Peteres, Sam, Smiþ
- Neut
- DET: þæt, Þis
- NOUN: cræt, ærend-ƿrit, eagþyrel, gold, her, mæden, seolfor, ær
- PRON: hit
- PROPN: Franc-landes
- Plur
- AUX-Fin: meaht, mihton
- PRON: Hie, heom
- VERB-Fin: beclypton, forletone, nabbaþ
- Sing
- ADJ: lytel, micel, readne
- AUX: is, hæfð, mihte, nolde, ƿæs, Ƿilt
- AUX-Fin: is, hæfð, nolde, ƿæs, Ƿilt
- DET: þæt, nanne, sume, Þis, þa, þone, þære
- NOUN: cræt, ærend-ƿrit, broþor, eagþyrel, efen-ȝemacca, eodor, freonde, fæder, gold, heafod-burh
- PRON: hit, þu, he, heo, Ic, hyre, Hƿæt, Min, hire, his
- PROPN: Maria, Petrus, Paris, Brun, Franc-landes, Iguazu, Iohanna, Petere, Peteres, Sam
- VERB: dyde, gan, ƿrat, bohte, drincan, locast, ontyne, rann, reȝnaþ, smican
- VERB-Fin: dyde, ƿrat, bohte, locast, ontyne, rann, reȝnaþ, þence, þenctst, ȝebyrede
- VERB-Inf: drincan, smican, ȝesƿicanne
- Acc
- ADJ: readne
- DET: þæt, nanne, sume, þone
- NOUN: cræt, eagþyrel, eodor, gold, her, hƿeol-bearƿe, seolfor, ær, ærend-ƿrit, ƿisan
- PRON: Hƿæt, hit
- PROPN: Brun, Maria, Petrus, Smiþ
- Dat
- DET: þa
- NOUN: freonde, heoþe
- PRON: heom
- PROPN: Petere
- Gen
- DET: þære
- PRON: hyre, Min, hire, his, þin
- PROPN: Franc-landes, Peteres
- Nom
- ADJ: lytel, micel
- DET: Þis, Þæt
- NOUN: broþor, efen-ȝemacca, fæder, mæden, neah-ȝebur, rice, ærend-ƿrit
- PRON: þu, he, heo, hit, Ic, Hie, hƿa, seo
- PROPN: Maria, Petrus, Iguazu, Iohanna, Sam
Degree and Polarity
- Cmp
- ADJ: mara
Verbal Features
- Imp
- VERB-Fin: ontyne
- Ind
- AUX-Fin: is, hæfð, meaht, mihton, nolde, ƿæs, Ƿilt
- VERB-Fin: dyde, ƿrat, beclypton, bohte, forletone, locast, nabbaþ, rann, reȝnaþ, þence
- Past
- AUX: meaht, mihte, mihton, nolde, ƿæs
- AUX-Fin: meaht, mihton, nolde, ƿæs
- VERB: dyde, ƿrat, beclypton, bohte, forletone, onȝemet, rann, ƿeox, ȝebyrede, ȝecorene
- VERB-Fin: dyde, ƿrat, beclypton, bohte, forletone, rann, ȝebyrede, ȝesohte, ȝeƿann, ȝeƿeasce
- VERB-Part: onȝemet, ȝecorene
- Pres
- AUX-Fin: is, hæfð, Ƿilt
- VERB-Fin: locast, nabbaþ, þence, þenctst
Pronouns, Determiners, Quantifiers
- 1
- AUX: mihte
- PRON: Ic, Min
- VERB-Fin: þence
- 2
- AUX-Fin: Ƿilt
- PRON: þu, þin
- VERB-Fin: locast, ontyne, þenctst
- 3
- AUX-Fin: is, hæfð, meaht, mihton, nolde, ƿæs
- PRON: heo, hit, he, heom, hire, his
- VERB: dyde, ƿrat, beclypton, bohte, forletone, rann, reȝnaþ, ƿeox, ȝebyrede, ȝesohte
- VERB-Fin: dyde, ƿrat, beclypton, bohte, forletone, rann, reȝnaþ, ȝebyrede, ȝesohte, ȝeƿann
Other Features
- ExtPos
- PRON
- PRON: heom
- PRON
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: beon.
- This corpus uses 4 lemmas as auxiliaries (aux). Examples: magan, habban, nyllan, ƿillan.
- This corpus uses 2 lemmas as passive auxiliaries (aux:pass). Examples: beon, ƿesan.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--PRON-Nom (3)
- VERB-Fin--NOUN-Nom (2)
- VERB-Fin--PRON-Nom (11)
- VERB-Inf--PRON-Nom (2)
- VERB-Part--NOUN-Nom (1)
- obj
- VERB-Fin--NOUN-Acc (7)
- VERB-Fin--NOUN-Dat (1)
- VERB-Fin--PRON-Acc (1)
- VERB-Fin--PRON-Dat (1)
- VERB-Inf--PRON (1)
- VERB-Part--NOUN-Acc (1)
- iobj
- VERB-Fin--NOUN-Dat (1)
Relations Overview
- This corpus uses 5 relation subtypes: aux:pass, cc:preconj, nmod:poss, nsubj:pass, obl:unmarked
- The following 1 main types are not used alone, they are always subtyped: nmod
- The following 10 relation types are not used in this corpus at all: dislocated, discourse, nummod, clf, compound, list, parataxis, goeswith, reparandum, dep