UD Soi AHA
Language: Soi (code: soj
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.7 release.
The following people have contributed to making this treebank part of UD: AmirHossein Mojiri Foroushani, Hamid Aghaei, Amir Ahmadi.
Repository: UD_Soi-AHA
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples, spoken
Questions, comments? General annotation questions (either Soi-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [amojiry (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | annotated manually |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
The AHA Soi Treebank is a small treebank for contemporary Soi. Its corpus is collected and annotated manually. We have prepared this treebank based on interviews with Soi speakers.
Soi treebank consist of 8 sentences of this stage. We are trying to make this corpus bigger day by day. AHA is a small group, tries to analyze Iranian language and find their similarities and differences.
Acknowledgments
Theses sentences were prepared with the help of Delijan people. On behalf of the AHA group, Delijan people is thanked. Also, Ms. Hanieh Mashayekhi sincerely helped us to translate the sentences. First, we used the sentences suggested by APLL (Academy of Persian Language and Literature) to collect Iranian languages. This project is a research project by AmirHossein, Hamid and Amir (AHA).
You can use this structure to refer to this project:
- Mojiri Foroushani, AmirHossein; Aghaei, Hamid; Ahmadi, Amir (2020): “AHA Soi dependency treebank”, Universal dependencies (universaldependencies.org)
Statistics of UD Soi AHA
POS Tags
ADP – ADV – AUX – NOUN – NUM – PRON – PUNCT – VERB
Features
Case – Mood – Number – NumType – Person – Polarity – PronType – Tense – VerbForm
Relations
advcl – advmod – aux – case – ccomp – compound:lvc – flat – nmod – nmod:poss – nsubj – nummod – obj – obl – punct – root
Tokenization and Word Segmentation
- This corpus contains 8 sentences and 55 tokens.
- This corpus contains 8 tokens (15%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
Morphology
Tags
- This corpus uses 8 UPOS tags out of 17 possible: ADP, ADV, AUX, NOUN, NUM, PRON, PUNCT, VERB
- This corpus does not use the following tags: PROPN, ADJ, DET, SCONJ, CCONJ, PART, INTJ, SYM, X
- This corpus contains 3 lemmas tagged as pronouns (PRON): اُن, م, مِن
- This corpus contains 0 lemmas tagged as determiners (DET):
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): دار
- There are 1 (de)verbal forms:
- Part
- VERB: بَشتُن, واجَن
Nominal Features
- Plur
- VERB-Part: واجَن
- Sing
- ADV: رویی
- AUX: دارُن
- NOUN: اُوِ, بار, برنج, بِرا, سات, سال, صبا, عبدولو, علی, لباس
- PRON: مِن, ِم, اُن, م
- VERB: اشی, اَپوشُن, اَکَرَ, بَشتُن, دِ, ناشی, نَدییَ, هاگِت
- VERB-Part: بَشتُن
- Tem
- ADV: الُن, هِزِ
Degree and Polarity
- Neg
- VERB: ناشی, نَدییَ
Verbal Features
- Imp
- VERB: دِ
- Fut
- AUX: دارُن
- Past
- VERB: هاگِت
- Pres
- VERB: اشی, اَپوشُن, اَکَرَ, ناشی, نَدییَ
Pronouns, Determiners, Quantifiers
- Dem
- PRON: مِن
- Prs
- PRON: ِم, اُن, م, مِن
- Card
- NUM: دِ, ئی, اَجی
- 1
- AUX: دارُن
- PRON: ِم, م, مِن
- VERB: اَپوشُن, بَشتُن, نَدییَ, هاگِت
- VERB-Part: بَشتُن
- 2
- VERB: دِ
- 3
- PRON: اُن
- VERB: اشی, اَکَرَ, ناشی, واجَن
- VERB-Part: واجَن
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: دار.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (2)
- VERB--PRON (3)
- obj
- VERB--NOUN (3)
- VERB--PRON (1)
Relations Overview
- This corpus uses 2 relation subtypes: compound:lvc, nmod:poss
- The following 1 main types are not used alone, they are always subtyped: compound
- The following 23 relation types are not used in this corpus at all: iobj, csubj, xcomp, vocative, expl, dislocated, discourse, cop, mark, appos, acl, amod, det, clf, conj, cc, fixed, list, parataxis, orphan, goeswith, reparandum, dep