UD Amharic ATT
Language: Amharic (code: am
)
Family: Afro-Asiatic
This treebank has been part of Universal Dependencies since the UD v2.2 release.
The following people have contributed to making this treebank part of UD: Binyam Ephrem, Gashaw Arutie, Tsegay Woldemariam, Juan Ignacio Navarro Horñiacek.
Repository: UD_Amharic-ATT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples, fiction, nonfiction, bible, news
Questions, comments? General annotation questions (either Amharic-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [binephrem (æt) gmail • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
UD_Amharic-ATT is a manual developed Treebanks for Amharic. Sentences were collected from grammar books, fictions, biographies, religious texts and news.
UD_Amharic-ATT is a manually annotated Treebanks. It is annotated for POS tag, morphological information and dependency relations. Since Amharic is a morphologically-rich, pro-drop, and languages having a feature of clitic doubling, clitics have been segmented manually.
Acknowledgments
The treebank is developed by Binyam Ephrem, Gashaw Arutie, and Tsegay Woldemariam. The syntactic annotation was checked and corrected manually by Binyam Ephrem.
References
- Binyam Ephrem Seyoum ,Yusuke Miyao and Baye Yimam Mekonnen.2018.Universal Dependencies for Amharic. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 2216–2222, Miyazaki, Japan: European Language Resources Association (ELRA)
Statistics of UD Amharic ATT
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Case – Gender – Mood – Number – NumType – Person – Polarity – Poss – PronType – Tense – Typo – VerbForm – Voice
Relations
acl – advcl – advmod – amod – aux – case – cc – ccomp – clf – compound – compound:svc – conj – cop – csubj – csubj:pass – dep – det – discourse – expl – fixed – flat – goeswith – iobj – mark – nmod – nsubj – nsubj:pass – nummod – obj – obl – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 1074 sentences, 5245 tokens and 10010 syntactic words.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 5 types of words that contain both letters and punctuation. Examples: ዬ_ው, ኣ_ት, ኣለ_ቅኔ, እየ_ዞርክ, ኧ_ሁ
- This corpus contains 2672 multi-word tokens. On average, one multi-word token consists of 2.78 syntactic words.
- There are 1857 types of multi-word tokens. Examples: ነው, ልጁ, ሄደ, አለ, ልጆቹ, ብሎ, ልብሱን, ልጁን, መጽሐፉን, መጣ, ቢሆን, ናት, አስተማሪው, እንደሆነ, ሲል, አልማዝን, ይሻላል, ቤቱን, ይመስላል, ሆነ, ምሳውን, በሩን, ብዬ, አለች, ከሆነ, ይሄዳል, ለምን, ለአልማዝ, መንገዱ, በላ, ናቸው, ኖሮ, አላውቅም, አይደለም, ከእናቱ, ይህን, ይሆን, ሄደች, ለመሄድ, ለአስቴር, መሬቱ, ሞተ, ሥራውን, በሄድኩ, በመኪና, በድንገት, ብትሆን, ተኮሰ, አለበት, አልመጣም.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 17 word types tagged as particles (PART): ም, በ, ን, አለ, አል, አይ, ኣ, ኣለ, ኣል, ኣን, ኣይ, እም, እየ, ከ, ው, የ, ያለ
- This corpus contains 30 lemmas tagged as pronouns (PRON): ሁ, ሁል, ህ, ለምን, መቼ, መች, ማን, ማንም, ምነው, ምን, ምንም, ምንድን, ስንት, አንተ, ኡ, ኣ, ኣንተ, እርሱ, እርስ, እሱ, እኔ, እንዴት, እንግዲያ, ኧው, ው, የት, የትም, የትኛ, የትኛው, ይ
- This corpus contains 21 lemmas tagged as determiners (DET): ምነው, ብዙ, ኡ, ኤ, እነ, እንዲህ, እዚህ, እዚያ, እየ, ዋ, ዋን, ው, ውን, ያ, ዬ, ዬ_ው, ይህ, ይህን, ይቺ, ይች, ይኸው
- Out of the above, 3 lemmas occurred sometimes as PRON and sometimes as DET: ምነው, ኡ, ው
- This corpus contains 11 lemmas tagged as auxiliaries (AUX): ሆን, ብል, ችል, ነበር, ን, ኖር, አይደል, አድርግ, ኣለ, ኣል, እየ
- Out of the above, 9 lemmas occurred sometimes as AUX and sometimes as VERB: ሆን, ብል, ችል, ነበር, ኖር, አይደል, አድርግ, ኣለ, ኣል
- There are 3 (de)verbal forms:
- Conv
- VERB: ገዝት, ብል, መጥት, በልት, ከፍት, ዘንብ, ይዝ, ሄድ, ሠራርት, ረጥብ
- Fin
- VERB: ሄድ, መጣ, ኣል, ሆን, በላ, ብል, ሰጥ, ቀር, ገዛ, ል
- Vnoun
- NOUN: መሄድ, መምጣት, መሆን, መሥራት, መግደል, መክፈል, መስራት, መቆየት, መታመም, መጻፍ
- VERB: መምጣት, ማለፍ, መሄጃ, መሆን, መሆኛ, መስል, መስረቅ, መኖር, መወሰን, መውደቅ
Nominal Features
- Fem
- DET: ዋ
- PRON: ኧች, ት, ኣት, ኣ, ዋ, ኢ, ሽ, ኣች, እሷ, ኧሽ
- Masc
- DET: ው
- PRON: ኧ, ይ, ት, ኦ, ው, ኡ, ህ, ኧት, ኧህ, እሱ
- VERB-Fin: ይ
- Neut
- PRON: ኝ, ኧሁ, ሁ, ኣቸው, ን, ኧ, ኡ, ኣሁ, ኧኝ, ዋ
- Plur
- ADJ: ብዙዎች, ተሰዳጆች
- NOUN: ልጆች, ተማሪዎች, ሰዎች, ወታደሮች, ጓደኞች, ልጁች, መጽሐፎች, ሴቶች, በጎች, አይኖች
- NUM: መቶዎች
- PRON: ኡ, ኣቸው, ኣችን, ኧን, ን, እን, ኣችሁ, ኧ, እነሱ, ት
- Sing
- DET: ው
- PRON: ኧ, ይ, ት, ኧው, ኧች, ኦ, ው, ኝ, ኣት, ኡ
- VERB-Fin: ይ
- Abl
- ADP: ለ, ከ
- Ben
- ADP: ል, ለ, በ, ብ
- Ins
- ADP: በ
- Loc
- ADP: በ, ከ, ወደ, እ, ላይ, እሰከ
- Mal
- ADP: ብ, በ, ል
Degree and Polarity
- Neg
- PART: ኣል, አል, ኣለ, ያለ, አለ, ኣይ, ም, አይ
Verbal Features
- Ind
- VERB-Fin: ሄድ, መጣ, ኣል, ሆን, በላ, ብል, ሰጥ, ቀር, ገዛ, ል
- Jus
- VERB-Fin: ሂድ, ብላ, ላክ, መጣ, ሰጥ, ስበር, ተቀመጥ, አሳይ, አውጣ, ውረድ
- Past
- AUX: ነበር
- Cau
- VERB-Fin: አስወሰድ, አስገደል, አስያዝ, አሳርፍ, አሳጠብ, አስመሽ, አስረዘም, አስሸለም, አስሸከም, አስቀመስ
- Pass
- VERB-Conv: ተብል, ተቸግር, ተይዝ, ታስር
- VERB-Fin: ተሻል, ተሰረቅ, ተቀመጥ, ተደሰት, ተለወጥ, ተመለስ, ተመኝ, ተሸለም, ተበደር, ተገነዘብ
- Rcp
- VERB-Fin: አጋደል, ሰባበር, ተለዋወጥ, ተነጋገር, ተናነቅ, ተንከባከብ, ተወራውር, ተደባደብ, ተገዳደል, ተጋደል
- Trans
- VERB-Fin: አለቀስ, አመጥ, አነሥ, አነበብ, አደናቀፍ, አገነፍ, አገኘ, አግዝ, አጠብ
Pronouns, Determiners, Quantifiers
- Prs
- PRON: ኧ, ይ, ት, ኡ, ኧው, ኧች, ኦ, ው, ኝ, ኣት
- Card
- NUM: አንድ, ሁለት, ሦስት, ብዙ, አስር, 1.85, ስምንት, ሶስት, ሺህ, አምስት
- Yes
- PRON: ኡ, ዋ, ኤ, ህ, ኣችን, ኣቸው, ው, ዬ, ሽ, ኣችሁ
- 1
- PRON: ኝ, ኤ, እ, ሁ, ኧሁ, ኩ, እኔ, ኧኝ, ኧን, ን
- 2
- PRON: ህ, ኧህ, ት, ሽ, ኢ, ክ, ኣችን, አንተ, ኧ, ኣችሁ
- 3
- DET: ው
- PRON: ኧ, ይ, ኡ, ት, ኧው, ኧች, ኦ, ው, ኣት, ኣ
- VERB-Fin: ይ
Other Features
- Typo
- Yes
- CCONJ: ነገር
- INTJ: እንዲያው
- PRON: ምንድ, ኧ, ምን
- VERB-Fin: መጥ
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: ን.
- This corpus uses 11 lemmas as auxiliaries (aux). Examples: ኣል, ን, ነበር, ሆን, ችል, ኖር, እየ, ብል, አይደል, አድርግ, ኣለ.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--PRON (1)
- VERB-Conv--NOUN (4)
- VERB-Conv--NOUN-ADP(ን) (1)
- VERB-Conv--PRON (34)
- VERB-Fin--NOUN (238)
- VERB-Fin--NOUN-ADP(ን) (3)
- VERB-Fin--NOUN-ADP(የ) (1)
- VERB-Fin--PRON (870)
- VERB-Vnoun--PRON (9)
- obj
- VERB-Conv--NOUN (9)
- VERB-Conv--NOUN-ADP(ን) (11)
- VERB-Conv--PRON-ADP(ን) (1)
- VERB-Fin--NOUN (201)
- VERB-Fin--NOUN-ADP(ለ) (1)
- VERB-Fin--NOUN-ADP(መ) (1)
- VERB-Fin--NOUN-ADP(ን) (206)
- VERB-Fin--NOUN-ADP(ከ)-ADP(በስተቀር) (1)
- VERB-Fin--NOUN-ADP(ወደ) (2)
- VERB-Fin--NOUN-ADP(የ)-ADP(ን) (1)
- VERB-Fin--PRON (81)
- VERB-Fin--PRON-ADP(ን) (9)
- VERB-Vnoun--NOUN (1)
- VERB-Vnoun--NOUN-ADP(ለ) (1)
- VERB-Vnoun--NOUN-ADP(ን) (3)
- iobj
- VERB-Fin--NOUN (5)
- VERB-Fin--NOUN-ADP(ን) (7)
- VERB-Fin--PRON (12)
Relations Overview
- This corpus uses 3 relation subtypes: compound:svc, csubj:pass, nsubj:pass
- The following 6 relation types are not used in this corpus at all: vocative, dislocated, appos, list, orphan, reparandum