UD Yupik SLI
Language: Yupik (code: ess
)
Family: Eskimo-Aleut
This treebank has been part of Universal Dependencies since the UD v2.8 release.
The following people have contributed to making this treebank part of UD: Hyunji Hayley Park, Lane Schwartz, Francis Tyers.
Repository: UD_Yupik-SLI
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Yupik-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [hpark129 (æt) illinois • edu]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
UD_Yupik-SLI is a treebank of St. Lawrence Island Yupik (ISO 639-3: ess) that has been manually annotated at the morpheme level, based on a finite-state morphological analyzer by Chen et al., 2020. The word-level annotation, merging multiword expressions, is provided in not-to-release/ess_sli-ud-test.merged.conllu. More information about the treebank can be found in our publication (AmericasNLP, 2021).
The current version contains dependency annotations for end-of-chapter exercises in A practical grammar of the St. Lawrence Island/Siberian Yupik Eskimo language (Jacobson, 2001).
Acknowledgments
…
References
@inproceedings{park-etal-2021-expanding,
title = "Expanding Universal Dependencies for Polysynthetic Languages: A Case of St.~Lawrence Island Yupik",
author = "Park, Hyunji Hayley and
Schwartz, Lane and
Tyers, Francis M.",
booktitle = "Proceedings of the 1st Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)",
month = jun,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics"
}
@inproceedings{chen-etal-2020-improved,
title = "Improved Finite-State Morphological Analysis for {S}t. {L}awrence {I}sland {Y}upik Using Paradigm Function Morphology",
author = "Chen, Emily and
Park, Hyunji Hayley and
Schwartz, Lane",
booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://www.aclweb.org/anthology/2020.lrec-1.326",
pages = "2676--2684",
language = "English",
ISBN = "979-10-95546-34-4",
}
@book{jacobsonPracticalGrammarSt2001,
title = {A Practical Grammar of the {{St}}. {{Lawrence Island}}/{{Siberian Yupik Eskimo}} Language},
author = {Jacobson, Steven A.},
year = {2001},
edition = {2. ed},
publisher = {{Alaska Native Language Center, College of Liberal Arts, University of Alaska}},
address = {{Fairbanks}},
isbn = {978-1-55500-077-6},
language = {en}
}
Statistics of UD Yupik SLI
POS Tags
ADV – CCONJ – DET – NOUN – NUM – PART – PRON – PUNCT – VERB – X
Features
Aspect – Case – Mood – Number – Number[obj] – Number[psor] – Number[subj] – Person – Person[obj] – Person[psor] – Person[subj] – Polarity – PronType – Reflex – Reflex[obj] – Reflex[subj] – Subcat – Tense
Relations
acl – advcl – advmod – appos – cc – conj – dep:ana – dep:aux – dep:cop – dep:emo – dep:infl – dep:mark – dep:pos – det – mark – nmod – nmod:arg – nsubj – nummod – obj – obl – obl:mod – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 309 sentences, 1221 tokens and 2568 syntactic words.
- This corpus contains 310 tokens (25%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 4 types of words that contain both letters and punctuation. Examples: -emun, an'gani, an'gigh, un'gani
- This corpus contains 773 multi-word tokens. On average, one multi-word token consists of 2.74 syntactic words.
- There are 650 types of multi-word tokens. Examples: aqelqat, yugem, qergesek, Kitum, esghaatunga, iflaak, mangteghameng, naagpek, taghnughhaat, ungipaataanga, yuget, Sivuqaghmeng, Tagitiki, aatgha, aatghit, apeghtughistem, ighneghten, ighneqa, kemekraga, mekelghiighet, mekestaaghhaaguq, pagunghaghmeng, qikmima, quyaaq, Aanaqukung, Aghnalqwaaghem, Kaamgek, Mekelghiighem, Naliita, Qafsinaneng, Quyillget, Sangaawa, Siivanlleghet, Teghikusat, Tengegkayuget, Ungazimi, Yupigestun, aanaqut, alquutat, anipameng, apaka, atanga, atughnaqunga, eflugameng, eglluk, eslallugughteghngaan, eslami, eslamun, gaaghpenaan, guutigu.
Morphology
Tags
- This corpus uses 10 UPOS tags out of 17 possible: ADV, CCONJ, DET, NOUN, NUM, PART, PRON, PUNCT, VERB, X
- This corpus does not use the following tags: PROPN, ADJ, AUX, ADP, SCONJ, INTJ, SYM
- This corpus contains 5 word types tagged as particles (PART): elngaatall, ighivgaq, qayughllak, quunpeng, unaami
- This corpus contains 16 lemmas tagged as pronouns (PRON): elpenun, iigna, iingku, ingku, kaanyuq, kina, kinku, kitu, kitumun, m, paamna, pagna, piku, qaamna, sameng, whangkunnun
- This corpus contains 2 lemmas tagged as determiners (DET): nali, qafsina
- This corpus contains 0 lemmas tagged as auxiliaries (AUX):
- This corpus does not use the VerbForm feature.
Nominal Features
- Dual
- X: k, ek, gka, egka, gnun, kek, egken, egn, egni, gneng
- Plur
- PRON: whangkunnun
- X: et, t, at, ten, neng, it, ma, meng, anka, enka
- Sing
- NOUN: lghii, Taghnughhaq, keneq, Amaa, Laluramka, Nanevgaq, Ukaziq, kufi, nefkuuraq, nguq
- PRON: paamna, Iigna, Kaanyuq, Qaamna, elpenun, m, pagna
- X: meng, m, mun, em, mi, ka, a, n, qa, nga
- Abl
- ADV: aagken, Pikegken, paamken
- X: meng, neng, aneng, gneng
- Abs
- NOUN: lghii, Taghnughhaq, keneq, Amaa, Laluramka, Nanevgaq, Ukaziq, kufi, nefkuuraq, nguq
- PRON: paamna, Iigna, Qaamna, pagna
- X: et, t, ka, a, k, n, qa, nga, at, ten
- Abs,Erg
- X: k, t
- All
- ADV: kiwavek, pagavek, sakmavek, whavek
- PRON: elpenun, whangkunnun
- X: mun, anun, gnun, -emun, minun, nun
- Equ
- X: estun, stun
- Erg
- PRON: m
- X: m, em, ma, gpek, ghpek, ita, am, an, et, um
- Gen
- NOUN: sikwaan
- X: em, m, ma, gpek, mi, t, an, at, et, ghpek
- Loc
- ADV: Awani, Ingani, Qagani, an'gani, imani, maani, pamani, pikani, un'gani, whani
- X: mi, ni, egn, egni
- Per
- ADV: paaggun
- X: kun, ngakun, teggun
- Voc
- PRON: Kaanyuq
Degree and Polarity
- Neg
- VERB: nghit, ghpe, igat, neghin, neghit, nghil, nghite, nneghi, nneghit, ngigal
- X: fqaa, yaquna
Verbal Features
- Hab
- VERB: aq
- Prog
- VERB: aq, iq, gaq
- Cn1
- X: ya, sa
- Cn2
- X: aqnga, gaqnga, iqnga
- Cnc
- X: ghngaagh, ghnga, nga
- Cnd
- X: k, kw, q, gk
- Ctm
- X: negh, ngh
- Ind
- X: u, a, tu, i, gu
- Int
- X: sin, si, a, tsi, estek, ta, zin, st, zi, awa
- Opt
- X: i, fqaa, igu, elt, la, ghha, gu, ilt, lgha, lla
- Prc
- X: fagilga
- Ptc
- X: ka, lghii, ke
- Sbr
- X: lu, na, llu
- Fut
- VERB: lleq, naq, nnaq, lleqe, naqe
- X: ghha, lgha, nake, yaquna
- Past
- VERB: uma, ma, ama
- Pres
- VERB: aq, iq, igat, gaq, ngigal
- X: i, fqaa, igu, elt, la, gu, ilt, lla, lt, ult
Pronouns, Determiners, Quantifiers
- Dem
- PRON: ingku, paamna, piku, Iigna, Kaanyuq, Qaamna, m, pagna
- Int
- ADV: Qakun, Sangavek, Qavngaq, Navek, Sangan, Naken, Nani, Sangami, naten, Sangama
- DET: Nali, Qafsina
- PRON: Kitu, Kina, Kinku, Sameng, Kitumun
- Yes
- X: ni, meng, mi, minun
- 1
- PRON: whangkunnun
- 2
- PRON: elpenun
- 3
- X: meng, an, mi
- Dual
- X: gpung, mtung
- Plur
- X: it, ita, ngat, taghnughhiit
- Sing
- NOUN: nengyaa, nulaa, qikmii, sikwaan
- X: ka, a, n, ma, qa, nga, gpek, ten, ni, an
Other Features
- Number[obj]
- Dual
- X: kek, gka, k, fkek, gkenka, kung
- Plur
- X: ki, i, inkut, it, ngi, nka
- Sing
- X: qa, gu, a, nga, anga, igu, ku, n, an, mken
- Dual
- Number[subj]
- Dual
- X: k, kung, estek, ung, agkenka, tek, yek
- Dual,Plur,Sing
- X: mken, n, an, gkenka, ma
- Plur
- X: t, a, tsi, kut, meng, akut, ata, it, si, teng
- Sing
- X: q, nga, qa, gu, sin, ki, a, i, lghii, n
- Dual
- Person[obj]
- 1
- X: anga, nga, inkut, aghminga, kung, ma, penga, uvnga, vnga
- 2
- X: mken, ten
- 3
- X: ki, qa, gu, a, i, igu, kek, ku, n, an
- 1
- Person[psor]
- 1
- X: ka, ma, qa, gka, anka, egka, enka, nka, gpung, mtung
- 2
- X: n, gpek, ten, ghpek, an, en, egken, pek
- 3
- NOUN: nengyaa, nulaa, qikmii, sikwaan
- X: a, nga, it, ni, an, aneng, anun, i, ita, kek
- 1
- Person[subj]
- 1
- X: nga, qa, kung, a, kut, ma, mken, ung, akut, gka
- 2
- X: sin, ki, gu, igu, uvek, kek, n, tsi, estek, nga
- 3
- X: q, t, k, a, gu, lghii, anga, i, an, n
- 1
- Reflex[obj]
- Yes
- X: tni, vgu
- Yes
- Reflex[subj]
- Yes
- X: ni, meng, mi, teng, aghminga, migu, tek, uni
- Yes
- Subcat
- Intr
- X: u, tu, a, sin, gu, k, lu, ya, i, tsi
- Tran
- X: a, i, si, lu, igu, ya, fqaa, ka, ta, la
- Intr
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (126)
- VERB--NOUN-Abs (30)
- VERB--PRON (4)
- VERB--PRON-Abs (1)
- VERB--PRON-All (1)
- VERB--PRON-Erg (1)
- VERB--PRON-Voc (1)
- obj
- VERB--NOUN (92)
- VERB--NOUN-Abs (25)
- VERB--PRON (2)
Relations Overview
- This corpus uses 9 relation subtypes: dep:ana, dep:aux, dep:cop, dep:emo, dep:infl, dep:mark, dep:pos, nmod:arg, obl:mod
- The following 1 main types are not used alone, they are always subtyped: dep
- The following 20 relation types are not used in this corpus at all: iobj, csubj, ccomp, vocative, expl, dislocated, discourse, aux, cop, amod, clf, case, fixed, flat, compound, list, parataxis, orphan, goeswith, reparandum