UD Apurina UFPA
Language: Apurina (code: apu
)
Family: Arawakan
This treebank has been part of Universal Dependencies since the UD v2.7 release.
The following people have contributed to making this treebank part of UD: Marília Fernanda, Sidney Facundes, Bruna Lima Padovani, Jack Rueter, Niko Partanen.
Repository: UD_Apurina-UFPA
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: nonfiction, news
Questions, comments? General annotation questions (either Apurina-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [rueter • jack (æt) gmail • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
This is an Apurinã treebank consisting of sentences from a grammatical description of the language by Maília Fernanda.
The initial release contains 70 annotated sentences. This is the first treebank in a language from the Arawak family. The original interlinear glosses are included in the tree bank, and their conversion into a full UD annotation is an ongoing process. The sent_id values (e.g.: FernandaM2017:Texto-6-19) are representative of the collector, year of publication, text identifier and the number of the sentence in order from the original text.
Acknowledgments
This tree bank has been done in collaboration between the Federal University of Para (UFPA) and the University of Helsinki.
Development repository: https://github.com/rueter/erme-ud-apurina
Finite-State Transistor development: https://github.com/giellalt/lang-apu
References
- Sidney da Silva Facundes, 2000: THE LANGUAGE OF THE APURINÃ PEOPLE OF BRAZIL, (MAIPURE/ARAWAK). (A Dissertation submitted to the Faculty of the Graduate School of State University of New York at Buffalo in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Linguistics.)
- Sidney da Silva Facundes; Marília Fernanda Pereira de Freitas; Bruna Fernanda Soares de Lima-Padovani, 2021: Number Expression in Apurinã (Arawák). Hämäläinen, Mika; Partanen, Niko; Alnajjar, Khalid (eds.) In book: Multilingual Facilitation. – University of Helsinki. (http://hdl.handle.net/10138/327787)
- Marília Fernanda Pereira de Freitas, 2017: A POSSE EM APURINÃ: descrição de construções atributivas e predicativas em comparação com outras línguas Aruák. Belém/PA.
- Bruna Fernanda Soares de Lima Padovani, 2020: ESTUDO DO LÉXICO DA LÍNGUA APURINÃ UMA PROPOSTA DE MACRO E MICROESTRURA PARA O DICIONÁRIO APURINÃ. Belém/PA.
Statistics of UD Apurina UFPA
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
AdvType – Aspect – Case – Derivation – Gender – Gender[obj] – Gender[psor] – Gender[subj] – Number – Number[obj] – Number[psor] – Number[subj] – Person – Person[obj] – Person[psor] – Person[subj] – Polarity – Possessed – PronType – VerbForm – VerbType
Relations
acl – acl:relcl – advcl – advcl:tcl – advmod – advmod:lmod – advmod:tmod – appos – aux – aux:exhort – case – cc – ccomp – compound – conj – cop – csubj – dep – det – discourse – dislocated – iobj – list – mark – nmod – nmod:poss – nsubj – nsubj:cop – nummod – obj – obj:agent – obl – obl:lmod – obl:tmod – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 161 sentences, 978 tokens and 981 syntactic words.
- This corpus contains 212 tokens (22%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 2 types of words that contain both letters and punctuation. Examples: 'awary, 'amarekatãĩ
- This corpus contains 3 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 3 types of multi-word tokens. Examples: Ykynykaapuku, iputurĩmitekatinhi, ãatsupa.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 2 word types tagged as particles (PART): kene, kuna
- This corpus contains 13 lemmas tagged as pronouns (PRON): Ywã, atha, kerupa, kiripa, nuta, nynuwa, pitha, pithe, uwa, ykyny, ykynyk, ykynypuku, ywa
- This corpus contains 5 lemmas tagged as determiners (DET): ie, ithu, kaiãapuku, kaiãpuku, kaiãũ
- This corpus contains 2 lemmas tagged as auxiliaries (AUX): amu, txa
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: txa
- There are 2 (de)verbal forms:
- Conv
- VERB: sikasaaky, Ĩkanapyryãkasaaky
- Vnoun
- NOUN: iũkatsupatinhi, ysãkirawatinhi
- VERB: makinhi, faltatinhi, fawtatinhi, iatinhi, ivinitinhi, mitekatinhi, nykaminhi, puturikinhi
Nominal Features
- Fem
- NOUN: maky, nynyru, pynyru, Sytuwakuru, amarute, sytu, syture, ytanuru
- NUM: Hãtu
- PRON: uwa, Kerupa
- PROPN: Kamĩkiu, Kanaiapa
- Masc
- DET: iia, iie, kaiãapukury
- NOUN: ximaky, aapuku, awiri, yky, kyky, kãkity, aapukutxi, kãkiti, ãkiti, ũty
- NUM: ãty
- PRON: ywa, Kiripa, Ywã, ywamunhi
- PROPN: Kirama, Syrywyny, Tutupary, Txiiakatxi
- VERB-Vnoun: puturikinhi
- Plur
- NOUN: Pupĩkaryny, Sytuwakuru, amarynyky, amarytane, imiakurykata, ypyrawakury, ũimiakury
- PRON: nynuwa, athamunhi, Atha, Hĩthamunhi
- Sing
- NOUN: ximaky, awiri, yky, aapuku, kyky, aapukutxi, ãkiti, ũty, nynyru, pynyru
- PRON: ywa, uwa, nuta, pitha, Ywã, ywamunhi
- PROPN: Kamĩkiu, Kanaiapa, Kirama, Syrywyny, Tutupary, Txiiakatxi
- Com
- NOUN: ytãnurukata, imiakurykata, iãkynykata, ũtanyrykata
- Dat
- ADV: apikumunhi
- NOUN: apikumunhi, aapukumunhi, sitatximunhi, ytanurumunhi
- PRON: athamunhi, Hĩthamunhi, ywamunhi
- Loc
- ADV: Ywã
- NOUN: kananeã, aapukutxiã, kawãryã, makiã, nytukarẽã, awinhinã, pawinhiã, ukinhiã, ãawinhinhĩã
- PRON: Ywã
- VERB: atamarakitinhitã, mitharyã
- Nom
- NOUN: ximaky, aapuku, awiri, yky, kyky, aapukutxi, ãkiti, ũty, nynyru, pynyru
- NUM: Ipi, Ãty
- PRON: ywa, nynuwa, uwa, nuta, pitha
- PROPN: Kamĩkiu, Kanaiapa, Kirama, Syrywyny, Tutupary, Txiiakatxi
- VERB-Vnoun: puturikinhi
- Tem
- ADV: Ikanapiriãsaaky
- VERB-Conv: sikasaaky, Ĩkanapyryãkasaaky
Degree and Polarity
- Neg
- PART: kuna, kene
Verbal Features
- Prog
- VERB: nhikanãtary
Pronouns, Determiners, Quantifiers
- Int
- ADV: Nanhikiripa, Natukupa
- PRON: Kiripa, Kerupa
- Prs
- PRON: ywa, nynuwa, uwa, nuta, pitha, athamunhi, Atha, Ywã, ywamunhi
- 1
- PRON: nuta, athamunhi, Atha
- 2
- PRON: pitha, Hĩthamunhi
- 3
- PRON: ywa, nynuwa, uwa, Ywã, ywamunhi
- Fem
- NOUN: Ukywyxikeru, Utukuryte, ukinhiã, uparĩka, ũaapuku, ũimiakury, ũtanyrykata, ũãapuku
- Masc
- NOUN: aapuku, iãkynytikinhi, aapukumunhi, aapukutxiã, Yẽrẽkatikinhi, aapukutxi, apy, arẽka, awinhi, iserẽkana
- Plur
- NOUN: iserẽkana, ykyynyrytena, ãawinhinhĩã, ãawinhipuku
- Sing
- NOUN: aapuku, nynyru, pynyru, pyry, aapukumunhi, aapukutxiã, nyry, Nhithary, Ukywyxikeru, Utukuryte
Other Features
- AdvType
- Tim
- ADV: Kitxaka, Ywasawaky
- Tim
- Derivation
- Proprietive
- VERB: kasunakyry, kaxinhiry
- Proprietive
- Gender[obj]
- Masc
- CCONJ: txamary
- VERB: awary, amutary, kamary, 'awary, kaiapukury, Makamary, Nymapuruĩtary, aiatary, apukary, atamatary
- Masc
- Gender[subj]
- Fem
- AUX: utxawa
- VERB: Naãtyru, ukamary, umitikary, usa
- Masc
- ADJ: Kataparaxinery
- ADV: myrykynyty
- AUX: itxa, itxawa
- NOUN: myramanery
- VERB: sary, awa, Kaiãũry, Kuaxary, Kunakamunyry, Naãtyry, apukary, awamaãtary, awana, awapeka
- VERB-Conv: Ĩkanapyryãkasaaky
- Fem
- Number[obj]
- Plur
- AUX: pitxawa
- Plur,Sing
- VERB: apukary, kamary, umitikary, xinhikapikary
- Sing
- CCONJ: txamary
- VERB: awary, amutary, kamary, 'awary, apukary, atamatary, kaiapukury, pysykanu, Makamary, Nymapuruĩtary
- Plur
- Number[subj]
- Plur
- VERB: awaika, awana
- Sing
- ADJ: Kataparaxinery
- ADV: myrykynyty, waikirinu
- AUX: itxa, itxawa, nhitxawa, pitxa, pitxawa, utxawa
- NOUN: myramanery
- VERB: sary, awa, pysykanu, Kaiãury, Kaiãũry, Kuaxary, Kunakamunyry, Naãtyru, Naãtyry, Waikai
- VERB-Conv: Ĩkanapyryãkasaaky
- Plur
- Person[obj]
- 1
- AUX: pitxawa
- VERB: pysykanu
- 3
- CCONJ: txamary
- VERB: awary, amutary, kamary, apukary, 'awary, atamatary, kaiapukury, Makamary, Nymapuruĩtary, aiatary
- 1
- Person[psor]
- 1
- NOUN: nynyru, nyry, Nhithary, nhiimatykyry, nywãka, ãawinhinhĩã, ãawinhipuku
- 2
- NOUN: pynyru, pyry, pamiianare, pawinhiã, pytukara, pywãka
- 3
- NOUN: Ukywyxikeru, Utukuryte, Yẽrẽkatikinhi, aapukutxiã, apy, iserẽkana, iãkynykata, iãkynytikinhi, ukinhiã, uky
- 1
- Person[subj]
- 1
- ADV: waikirinu
- AUX: nhitxawa
- VERB: awaika, nawa
- 2
- AUX: pitxa, pitxawa
- VERB: pysykanu, Waikai, pyna
- 3
- ADJ: Kataparaxinery
- ADV: myrykynyty
- AUX: itxa, itxawa, utxawa
- NOUN: myramanery
- VERB: sary, awa, Kaiãury, Kaiãũry, Kuaxary, Kunakamunyry, Naãtyru, Naãtyry, apukary, awamaãtary
- VERB-Conv: Ĩkanapyryãkasaaky
- 1
- Possessed
- No
- NOUN: ximaky, awiri, kyky, tiwitxi, aapukutxi, mãkatxi, nhipukury, yky, ũty, tiitxi
- PROPN: Kamĩkiu, Kanaiapa, Kirama, Syrywyny, Tutupary, Txiiakatxi
- VERB: nhipukury, puturikinhi
- VERB-Vnoun: puturikinhi
- Yes
- NOUN: aapuku, nynyru, pynyru, pyry, aapukumunhi, nyry, yky, Nhithary, Ukywyxikeru, Utukuryte
- No
- VerbType
- Vido
- ADJ: Kataparaxinery
- ADV: myrykynyty
- NOUN: myramanery
- VERB: sary, Kaiãury, Kuaxary, Kunakamunyry, Naãtyru, Waikai, awaika, kasunakyry, kaxinhiry, mitxiry
- Vido
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: txa.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: txa.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (45)
- VERB--NOUN-Nom (29)
- VERB--PRON (8)
- VERB--PRON-Nom (48)
- VERB-Vnoun--NOUN-Nom (1)
- obj
- VERB--NOUN (46)
- VERB--NOUN-Nom (33)
- VERB--PRON (2)
- VERB--PRON-Nom (1)
- VERB-Conv--NOUN-Nom (1)
- VERB-Vnoun--NOUN (1)
- VERB-Vnoun--NOUN-Nom (1)
- iobj
- VERB--PRON-Nom (1)
Relations Overview
- This corpus uses 10 relation subtypes: acl:relcl, advcl:tcl, advmod:lmod, advmod:tmod, aux:exhort, nmod:poss, nsubj:cop, obj:agent, obl:lmod, obl:tmod
- The following 9 relation types are not used in this corpus at all: expl, amod, clf, fixed, flat, parataxis, orphan, goeswith, reparandum