home edit page issue tracker

This page pertains to UD version 2.

UD Indonesian CSUI

Language: Indonesian (code: id)
Family: Austronesian, Malayo-Sumbawan

This treebank has been part of Universal Dependencies since the UD v2.7 release.

The following people have contributed to making this treebank part of UD: Ika Alfina, Jessica Naraiswari Arwidarasti, Muhammad Yudistira Hanifmuti, Arawinda Dinakaramani, Ruli Manurung, Fam Rashel, Andry Luthfi.

Repository: UD_Indonesian-CSUI
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-SA 4.0

Genre: nonfiction, news

Questions, comments? General annotation questions (either Indonesian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [ika • alfina (æt) cs • ui • ac • id]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas assigned by a program, with some manual corrections, but not a full manual verification
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
Features assigned by a program, with some manual corrections, but not a full manual verification
Relations annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion

Description

UD Indonesian-CSUI is a conversion from an Indonesian constituency treebank in the Penn Treebank format named Kethu that was also a conversion from a constituency treebank built by Dinakaramani et al. (2015). We named this treebank Indonesian-CSUI, since all the three versions of the treebanks were built at Faculty of Computer Science, Universitas Indonesia.

UD Indonesian-CSUI treebank was converted automatically from the Kethu treebank, an Indonesian constituency treebank in the Penn Treebank format. The Kethu treebank itself was converted from a consituency treebank built by Dinakaramani et al. (2015).

Other characteristics of the treebank:

Acknowledgments

References

Statistics of UD Indonesian CSUI

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

ClusivityDefiniteDegreeForeignMoodNumberNumTypePersonPolarityPolitePronTypeReflexVoice

Relations

aclacl:relcladvcladvmodadvmod:emphamodapposauxcasecase:advcccc:preconjccompclfcompound:aconjcopcsubjdepdetdiscoursedislocatedfixedflatflat:foreignflat:nameiobjmarknmodnmod:lmodnmod:possnmod:tmodnsubjnsubj:passnummodobjoblobl:agentobl:tmodorphanparataxispunctrootxcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview