home edit page issue tracker

This page pertains to UD version 2.

UD Kadiweu Unicamp

Language: Kadiweu (code: kbc)
Family: Guaicuruan

This treebank has been part of Universal Dependencies since the UD v2.18 release.

The following people have contributed to making this treebank part of UD: Filomena Spatti Sandalo, Leonel Figueiredo de Alencar, Charlotte Chambelland Galves, Luiz Veronesi, Daniel Zeman.

Repository: UD_Kadiweu-Unicamp
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18

License: CC BY-NC-SA 4.0

Genre: grammar-examples

Questions, comments? General annotation questions (either Kadiweu-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [sandalo (æt) unicamp • br, leonel • de • alencar (æt) ufc • br]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS annotated manually
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

UD_Kadiweu-UNICAMP is a treebank for Kadiwéu (ISO-639: kbc), an endangered Indigenous language of Brazil. It consists of isolated sentences produced by native speakers.

Kadiwéu is a polysynthetic language spoken in the state of Mato Grosso do Sul, Brazil. It is severely endangered: among approximately 1,500 Kadiwéu people, fewer than 300 speak the language, as many have shifted to Portuguese (Pires 2022). Kadiwéu is the only representative of the Waikurúan linguistic family in Brazil. This family includes four additional languages: Toba, Pilagá, and Mocoví, mostly spoken in Argentina, and Abipón, formerly spoken in Argentina but now extinct (Sandalo 1995).

UD_Kadiweu-UNICAMP is the first treebank for a Waikurúan language in the UD collection, contributing to the documentation and computational modeling of a poorly documented and under-resourced language family. It is an ongoing project, currently consisting of isolated sentences produced by native speakers, most of which are translations of Portuguese sentences. Future versions will also include narratives and other genres.

Acknowledgments

The construction of this treebank has been funded by the São Paulo Research Foundation (FAPESP) through the DACILAT project (grant No. 22/09158-5). It is part of the postdoctoral research of Leonel Figueiredo de Alencar at the Department of Linguistics of the State University of Campinas (UNICAMP), under the supervision of Filomena Spatti Sandalo, coordinator of the DACILAT project, and in collaboration with Charlotte Chambelland Galves.

We are much indebted to the speakers of Kadiwéu for sharing their knowledge of their language and for providing translations and acceptability judgements on constructed sentences.

References

Statistics of UD Kadiweu Unicamp

POS Tags

ADJADVAUXDETNOUNPARTPRONPROPNPUNCTSCONJVERB

Features

AdvTypeAspectDegreeDeixisGenderGender[obj]MoodNumberNumber[obj]Number[psor]PersonPerson[erg]Person[obj]Person[psor]PolarityPossPronTypeVerbFormVoice

Relations

acl:relcladvcladvmodauxdetdislocatedmarknmod:possnsubjobjpunctroot

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview