home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD Yupik SLI

Language: Yupik (code: ess)
Family: Eskimo-Aleut

This treebank has been part of Universal Dependencies since the UD v2.8 release.

The following people have contributed to making this treebank part of UD: Hyunji Hayley Park, Lane Schwartz, Francis Tyers.

Repository: UD_Yupik-SLI
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18

License: CC BY-SA 4.0

Genre: grammar-examples

Questions, comments? General annotation questions (either Yupik-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [hpark129 (æt) illinois • edu]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation	Source
Lemmas	annotated manually
UPOS	annotated manually, natively in UD style
XPOS	not available
Features	annotated manually, natively in UD style
Relations	annotated manually, natively in UD style

Description

UD_Yupik-SLI is a treebank of St. Lawrence Island Yupik (ISO 639-3: ess) that has been manually annotated at the morpheme level, based on a finite-state morphological analyzer by Chen et al., 2020. The word-level annotation, merging multiword expressions, is provided in not-to-release/ess_sli-ud-test.merged.conllu. More information about the treebank can be found in our publication (AmericasNLP, 2021).

The current version contains dependency annotations for end-of-chapter exercises in A practical grammar of the St. Lawrence Island/Siberian Yupik Eskimo language (Jacobson, 2001).

Acknowledgments

…

References

@inproceedings{park-etal-2021-expanding,
title = "Expanding Universal Dependencies for Polysynthetic Languages: A Case of St.~Lawrence Island Yupik",
author = "Park, Hyunji Hayley and
Schwartz, Lane and
Tyers, Francis M.",
booktitle = "Proceedings of the 1st Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)",
month = jun,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics"
}

@inproceedings{chen-etal-2020-improved,
title = "Improved Finite-State Morphological Analysis for {S}t. {L}awrence {I}sland {Y}upik Using Paradigm Function Morphology",
author = "Chen, Emily and
Park, Hyunji Hayley and
Schwartz, Lane",
booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://www.aclweb.org/anthology/2020.lrec-1.326",
pages = "2676--2684",
language = "English",
ISBN = "979-10-95546-34-4",
}

@book{jacobsonPracticalGrammarSt2001,
title = {A Practical Grammar of the {{St}}. {{Lawrence Island}}/{{Siberian Yupik Eskimo}} Language},
author = {Jacobson, Steven A.},
year = {2001},
edition = {2. ed},
publisher = {{Alaska Native Language Center, College of Liberal Arts, University of Alaska}},
address = {{Fairbanks}},
isbn = {978-1-55500-077-6},
language = {en}
}

Statistics of UD Yupik SLI

POS Tags

ADV – CCONJ – DET – NOUN – NUM – PART – PRON – PUNCT – VERB – X

Features

Aspect – Case – Mood – Number – Number[obj] – Number[psor] – Number[subj] – Person – Person[obj] – Person[psor] – Person[subj] – Polarity – PronType – Reflex – Reflex[obj] – Reflex[subj] – Subcat – Tense

Relations

acl – advcl – advmod – appos – cc – conj – dep:ana – dep:aux – dep:cop – dep:emo – dep:infl – dep:mark – dep:pos – det – mark – nmod – nmod:arg – nsubj – nummod – obj – obl – obl:mod – punct – root – xcomp

Tokenization and Word Segmentation

This corpus contains 309 sentences, 1221 tokens and 2568 syntactic words.

This corpus contains 310 tokens (25%) that are not followed by a space.

This corpus does not contain words with spaces.

This corpus contains 4 types of words that contain both letters and punctuation. Examples: -emun, an'gani, an'gigh, un'gani

This corpus contains 773 multi-word tokens. On average, one multi-word token consists of 2.74 syntactic words.
There are 650 types of multi-word tokens. Examples: aqelqat, yugem, qergesek, Kitum, esghaatunga, iflaak, mangteghameng, naagpek, taghnughhaat, ungipaataanga, yuget, Sivuqaghmeng, Tagitiki, aatgha, aatghit, apeghtughistem, ighneghten, ighneqa, kemekraga, mekelghiighet, mekestaaghhaaguq, pagunghaghmeng, qikmima, quyaaq, Aanaqukung, Aghnalqwaaghem, Kaamgek, Mekelghiighem, Naliita, Qafsinaneng, Quyillget, Sangaawa, Siivanlleghet, Teghikusat, Tengegkayuget, Ungazimi, Yupigestun, aanaqut, alquutat, anipameng, apaka, atanga, atughnaqunga, eflugameng, eglluk, eslallugughteghngaan, eslami, eslamun, gaaghpenaan, guutigu.

Morphology

Nominal Features

Number

Dual
- X: k, ek, gka, egka, gnun, kek, egken, egn, egni, gneng

Plur
- PRON: whangkunnun
- X: et, t, at, ten, neng, it, ma, meng, anka, enka

Sing
- NOUN: lghii, Taghnughhaq, keneq, Amaa, Laluramka, Nanevgaq, Ukaziq, kufi, nefkuuraq, nguq
- PRON: paamna, Iigna, Kaanyuq, Qaamna, elpenun, m, pagna
- X: meng, m, mun, em, mi, ka, a, n, qa, nga

Case

Abl
- ADV: aagken, Pikegken, paamken
- X: meng, neng, aneng, gneng

Abs
- NOUN: lghii, Taghnughhaq, keneq, Amaa, Laluramka, Nanevgaq, Ukaziq, kufi, nefkuuraq, nguq
- PRON: paamna, Iigna, Qaamna, pagna
- X: et, t, ka, a, k, n, qa, nga, at, ten

Abs,Erg
- X: k, t

All
- ADV: kiwavek, pagavek, sakmavek, whavek
- PRON: elpenun, whangkunnun
- X: mun, anun, gnun, -emun, minun, nun

Equ
- X: estun, stun

Erg
- PRON: m
- X: m, em, ma, gpek, ghpek, ita, am, an, et, um

Gen
- NOUN: sikwaan
- X: em, m, ma, gpek, mi, t, an, at, et, ghpek

Loc
- ADV: Awani, Ingani, Qagani, an'gani, imani, maani, pamani, pikani, un'gani, whani
- X: mi, ni, egn, egni

Per
- ADV: paaggun
- X: kun, ngakun, teggun

Voc
- PRON: Kaanyuq

Degree and Polarity

Polarity

Neg
- VERB: nghit, ghpe, igat, neghin, neghit, nghil, nghite, nneghi, nneghit, ngigal
- X: fqaa, yaquna

Verbal Features

Aspect

Hab
- VERB: aq

Prog
- VERB: aq, iq, gaq

Mood

Cn1
- X: ya, sa

Cn2
- X: aqnga, gaqnga, iqnga

Cnc
- X: ghngaagh, ghnga, nga

Cnd
- X: k, kw, q, gk

Ctm
- X: negh, ngh

Ind
- X: u, a, tu, i, gu

Int
- X: sin, si, a, tsi, estek, ta, zin, st, zi, awa

Opt
- X: i, fqaa, igu, elt, la, ghha, gu, ilt, lgha, lla

Prc
- X: fagilga

Ptc
- X: ka, lghii, ke

Sbr
- X: lu, na, llu

Tense

Fut
- VERB: lleq, naq, nnaq, lleqe, naqe
- X: ghha, lgha, nake, yaquna

Past
- VERB: uma, ma, ama

Pres
- VERB: aq, iq, igat, gaq, ngigal
- X: i, fqaa, igu, elt, la, gu, ilt, lla, lt, ult

Pronouns, Determiners, Quantifiers

PronType

Dem
- PRON: ingku, paamna, piku, Iigna, Kaanyuq, Qaamna, m, pagna

Int
- ADV: Qakun, Sangavek, Qavngaq, Navek, Sangan, Naken, Nani, Sangami, naten, Sangama
- DET: Nali, Qafsina
- PRON: Kitu, Kina, Kinku, Sameng, Kitumun

Reflex

Yes
- X: ni, meng, mi, minun

Person

1
- PRON: whangkunnun

2
- PRON: elpenun

3
- X: meng, an, mi

Number[psor]

Dual
- X: gpung, mtung

Plur
- X: it, ita, ngat, taghnughhiit

Sing
- NOUN: nengyaa, nulaa, qikmii, sikwaan
- X: ka, a, n, ma, qa, nga, gpek, ten, ni, an

Other Features

Number[obj]
- Dual
  - X: kek, gka, k, fkek, gkenka, kung
- Plur
  - X: ki, i, inkut, it, ngi, nka
- Sing
  - X: qa, gu, a, nga, anga, igu, ku, n, an, mken

Number[subj]
- Dual
  - X: k, kung, estek, ung, agkenka, tek, yek
- Dual,Plur,Sing
  - X: mken, n, an, gkenka, ma
- Plur
  - X: t, a, tsi, kut, meng, akut, ata, it, si, teng
- Sing
  - X: q, nga, qa, gu, sin, ki, a, i, lghii, n

Person[obj]
- 1
  - X: anga, nga, inkut, aghminga, kung, ma, penga, uvnga, vnga
- 2
  - X: mken, ten
- 3
  - X: ki, qa, gu, a, i, igu, kek, ku, n, an

Person[psor]
- 1
  - X: ka, ma, qa, gka, anka, egka, enka, nka, gpung, mtung
- 2
  - X: n, gpek, ten, ghpek, an, en, egken, pek
- 3
  - NOUN: nengyaa, nulaa, qikmii, sikwaan
  - X: a, nga, it, ni, an, aneng, anun, i, ita, kek

Person[subj]
- 1
  - X: nga, qa, kung, a, kut, ma, mken, ung, akut, gka
- 2
  - X: sin, ki, gu, igu, uvek, kek, n, tsi, estek, nga
- 3
  - X: q, t, k, a, gu, lghii, anga, i, an, n

Reflex[obj]
- Yes
  - X: tni, vgu

Reflex[subj]
- Yes
  - X: ni, meng, mi, teng, aghminga, migu, tek, uni

Subcat
- Intr
  - X: u, tu, a, sin, gu, k, lu, ya, i, tsi
- Tran
  - X: a, i, si, lu, igu, ya, fqaa, ka, ta, la

Syntax

Auxiliary Verbs and Copula

This corpus does not contain copulas.

This corpus does not contain auxiliaries.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB--NOUN (126)
- VERB--NOUN-Abs (30)
- VERB--PRON (4)
- VERB--PRON-Abs (1)
- VERB--PRON-All (1)
- VERB--PRON-Erg (1)
- VERB--PRON-Voc (1)

obj
- VERB--NOUN (92)
- VERB--NOUN-Abs (25)
- VERB--PRON (2)

iobj

Relations Overview

This corpus uses 9 relation subtypes: dep:ana, dep:aux, dep:cop, dep:emo, dep:infl, dep:mark, dep:pos, nmod:arg, obl:mod
The following 1 main types are not used alone, they are always subtyped: dep
The following 20 relation types are not used in this corpus at all: iobj, csubj, ccomp, vocative, expl, dislocated, discourse, aux, cop, amod, clf, case, fixed, flat, compound, list, parataxis, orphan, goeswith, reparandum