home edit page issue tracker

This page pertains to UD version 2.

UD Egyptian PC

Language: Egyptian (code: egy)
Family: Afro-Asiatic

This treebank has been part of Universal Dependencies since the UD v2.14 release.

The following people have contributed to making this treebank part of UD: Roberto Antonio Díaz Hernández, Bruno Guillaume, Daniel Zeman.

Repository: UD_Egyptian-PC
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18

License: CC BY-SA 4.0

Genre: bible, fiction, nonfiction, government

Questions, comments? General annotation questions (either Egyptian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [radiaz (æt) ujaen • es]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

Egyptian-PC is the first dependency treebank created for the morphosyntactic annotation of pre-Coptic Egyptian. It is developed at the University of Jaén. Its current state (UD v2.18) consists of 3,089 sentences and 34,234 tokens manually annotated from the Pyramid Texts.

The Egyptian-PC treebank (henceforth EPC; originally released as Egyptian-UJaen) contains a corpus of Egyptian texts manually annotated using the Tübingen transcription system (see below). It aims to contribute to the Universal Dependencies (UD) project and to the PARSEME corpora of multiword expressions in order to compare Egyptian morphosyntactic features with those from other languages. The EPC treebank started as UD release 2.14 with 5,515 words and 707 sentences. It contained Old Egyptian multiword expressions and sentences from the Pyramid Texts (see list of sources, below). Unas’s Pyramid Texts were annotated in the EPC treebank for the UD release 2.15, and Teti’s Pyramid Texts for the UD release 2.16. Annotation of Pepi I’s Pyramid Texts began for the UD release 2.17. The main witnesses of the Pyramid Texts were annotated up to utterance 514 in the UD release 2.18. The last version of the EPC treebank has been used to develop a parser for Earlier Egyptian based on Stanza.

Data exploration in the Pyramid Texts can be carried out using GrewPT

The treebank will contain texts from various historical stages: Old Egyptian, Middle Egyptian, Late Egyptian and Demotic. For an overall description of these linguistic stages, see the Language Page for Egyptian; and the bibliography below.

Acknowledgments

This work received support from the CA21167 COST action UniDive, funded by COST (European Cooperation in Science and Technology). I thank Agata Savary (UniDive/PARSEME), Daniel Zeman (UniDive/UD) and Marco Carlo Passarotti (CIRCSE) for introducing me to computational linguistics.

Statistics of UD Egyptian PC

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AdvTypeAspectCaseConjugExtPosForeignGenderMoodNameTypeNisbaNominalNumberNumTypePartTypePersonPolarityPossPrefixPronClassPronTypeReflexStatPrepSubFormTenseTypoVerbClassVerbFormVerbTypeVoice

Relations

aclacl:relcladvcladvcl:consecadvcl:purpadvcl:tcladvmodadvmod:negadvmod:qamodapposauxcaseccccompccomp:objccomp:speechcompoundcompound:aconjcopcsubjcsubj:outercsubj:passdepdetdiscoursedislocateddislocated:agentdislocated:ccompdislocated:csubjdislocated:nsubjdislocated:objdislocated:oblexplexpl:pvfixedflat:foreignflat:namelistmarknmodnmod:nisbanmod:possnmod:unmarkednsubjnsubj:outernsubj:passnummodobjoblobl:agentobl:argobl:nisbaobl:unmarkedorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags