UD Gorontalo BungoLoLombi
Language: Gorontalo (code: gor)
Family: Austronesian
This treebank has been part of Universal Dependencies since the UD v2.18 release.
The following people have contributed to making this treebank part of UD: Andrew Thomas Dyer, Colleen Alena O’Brien.
Repository: UD_Gorontalo-BungoLoLombi
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Gorontalo-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [andrew • dyer (æt) uni-saarland • de]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
Bungo lo Lombi is a Universal Dependencies parsed corpus of modern spoken Gorontalo as spoken in Gorontalo City, Gorontalo Province, Indonesia. It comprises fieldwork samples obtained by Colleen Alena O’Brien.
Bungo lo Lombi is a corpus of modern spoken Gorontalo as spoken in Gorontalo City, Gorontalo Province, Indonesia. It comprises fieldwork samples obtained by Colleen Alena O’Brien. The complete data contains elicited examples and monologue and dialogue. At the moment, only elicited examples have been parsed.
The parsed data is different from other Austronesian languages in Universal Dependencies in the following ways:
- Dependency relations for core arguments use semantic sublabels in all verb phrases with voice-marking, e.g. nsubj:actor, obj:patient, obj:agent, etc. In this way, no voice is treated as default.
- Some feature values are replaced, e.g.
Voice=Patfor patient voice instead ofVoice=Pass. We refer to the paper in the README for more details. In practice, these new values can be losslessly mapped back to pre-existing ones in order to share labels with other corpora.
The name Bungo lo Lombi means “banana tree” in Gorontalo: a very useful, very versatile tree that provides a valuable fruit.
Acknowledgments
- Key elicitation examples and explanations provided by Novi Usu.
References
Cite as:
@inproceedings{dyer-obrien-2025-towards,
title = "Towards better annotation practices for symmetrical voice in {U}niversal {D}ependencies",
author = "Dyer, Andrew Thomas and
O{'}Brien, Colleen Alena",
editor = {Bouma, Gosse and
{\c{C}}{\"o}ltekin, {\c{C}}a{\u{g}}r{\i}},
booktitle = "Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)",
month = aug,
year = "2025",
address = "Ljubljana, Slovenia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.udw-1.15/",
pages = "137--142",
ISBN = "979-8-89176-292-3",
abstract = "Austronesian languages exhibit features that are challenging for Universal Dependencies: most notably, the symmetric voice system, whereby agent, patient, and instrumental arguments (among others) can be the pivot of a transitive structure {--} complicating the usual assumption that subjects of transitive sentences are semantic agents, and objects semantic patients. To showcase our ideas of how to address the representation of such systems in Universal Dependencies, we introduce a small treebank of sentences from texts and elicitation sessions in Gorontalo, an Austronesian language of Sulawesi (Indonesia), which exhibits a Philippine-type voice system. We discuss the annotation guidelines for this language, and the extensions of the Universal Dependencies guidelines that are needed to accommodate this and other Austronesian languages."
}
Andrew Thomas Dyer and Colleen Alena O’Brien. 2025. Towards better annotation practices for symmetrical voice in Universal Dependencies. In Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025), pages 137–142, Ljubljana, Slovenia. Association for Computational Linguistics.
Statistics of UD Gorontalo BungoLoLombi
POS Tags
ADJ – ADP – ADV – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – VERB
Features
Aspect – Case – Gender – Mood – Number – Person – PronType – Voice
Relations
advmod – amod – case – cc – clf – compound:redup – conj – dep – det – iobj:instrument – iobj:patient – nmod – nmod:poss – nsubj – nsubj:agent – nsubj:instrument – nsubj:patient – nummod – obj:agent – obj:patient – obl – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 39 sentences and 205 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 7 types of words that contain both letters and punctuation. Examples: hiyo-hiyongo, lo-pomulo, lo-tubu, mo’opotala, pilo-pomulo, pilo-tubu, pilotubu'u
Morphology
Tags
- This corpus uses 11 UPOS tags out of 17 possible: ADJ, ADP, ADV, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, VERB
- This corpus does not use the following tags: AUX, SCONJ, INTJ, SYM, PUNCT, X
- This corpus contains 1 word types tagged as particles (PART): mayi
- This corpus contains 4 lemmas tagged as pronouns (PRON): ami, liyo, tiyo, wau
- This corpus contains 1 lemmas tagged as determiners (DET): boyito
- This corpus contains 0 lemmas tagged as auxiliaries (AUX):
- This corpus does not use the VerbForm feature.
Nominal Features
- Fem
- ADP: li, Ti
- Masc
- ADP: Te, le
- Plur
- NOUN: mongololai
- PRON: Ami
- Sing
- PRON: wau, Tiyo, liyo
- Gen
- ADP: lo, le, li
- PRON: liyo
- Nom
- PRON: Ami, Tiyo
- Npiv
- ADP: lo, li, le
- Piv
- ADP: Ti, Te
Degree and Polarity
Verbal Features
- Prog
- VERB: healipo, hemohutu, hemomiyaato, hemongalipa, hepohutu, hepongalipo
- Ind
- VERB: hiyo-hiyongo, lohama, tilubu, bilindao, hilama, lo-pomulo, lo-tubu, lodehu, lodungoge, lolangi
- Irr
- ADJ: mololo, mo’opotala
- VERB: mobuka
- Act
- VERB: lohama, hemohutu, hemomiyaato, hemongalipa, lo-pomulo, lo-tubu, lodehu, lodungoge, lolangi, lomindao
- Ivoc
- VERB: hepongalipo, pilo-pomulo, pilo-tubu, pilohama, pilotubu'u
- Pat
- VERB: tilubu, bilindao, hepohutu, hilama, piliyaato, pilomulo, yilohiu
Pronouns, Determiners, Quantifiers
- Dem
- DET: boyito
- Prs
- PRON: wau, Tiyo, Ami, liyo
- 1
- PRON: wau, Ami
- 3
- PRON: Tiyo
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
Relations Overview
- This corpus uses 9 relation subtypes: compound:redup, iobj:instrument, iobj:patient, nmod:poss, nsubj:agent, nsubj:instrument, nsubj:patient, obj:agent, obj:patient
- The following 3 main types are not used alone, they are always subtyped: compound, iobj, obj
- The following 20 relation types are not used in this corpus at all: csubj, ccomp, vocative, expl, dislocated, advcl, discourse, aux, cop, mark, appos, acl, fixed, flat, list, parataxis, orphan, goeswith, reparandum, punct