home edit page issue tracker

This page pertains to UD version 2.

UD Portuguese PetroGold

Language: Portuguese (code: pt)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v2.11 release.

The following people have contributed to making this treebank part of UD: Elvis de Souza, Cláudia Freitas, Aline Silveira, Tatiana Cavalcanti, Maria Clara Castro, Wograine Evelyn.

Repository: UD_Portuguese-PetroGold
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.14

License: CC BY-SA 4.0

Genre: academic

Questions, comments? General annotation questions (either Portuguese-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [elvis • desouza99 (æt) gmail • com]. Development of the treebank happens in the UD repository but not directly in the final CoNLL-U files. You may submit bug fixes as pull requests against the dev branch but you have to go to the folder called not-to-release and locate the source files there. Contact the treebank maintainers if in doubt.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style


UD_Portuguese-PetroGold is a fully revised treebank which consists of academic texts from the oil & gas domain in Brazilian Portuguese.

UD_Portuguese-PetroGold is a fully revised treebank which consists of academic texts from the oil & gas domain in Brazilian Portuguese processed in full: only elements such as summary, abstract, appendices and bibliographic references were excluded, as well as figures, graphs, formulas and tables. The annotation was manually revised from automatic annotation by a team of linguists from PUC-Rio (Brazil).

The corpus was created as part of the Petrolês Project (http://petroles.puc-rio.ai), a partnership between Petrobras Research and Development Center (CENPES) and Applied Computational Intelligence Lab (PUC-Rio/ICA). Petrolês aims to promote research initiatives related to Natural Language Processing and Computational Linguistics for the Portuguese Language.


We want to thank everyone from ICA/PUC-Rio who assisted in the process of gathering the text from originally PDF files. We also want to thank Petrobras researchers and geoscientists for making the Petrolês corpus publicly available, for their technical assistance and funding.

How to contribute

Changes should be made via pull request directly to not-to-release/petrogold.conllu in the dev branch.

How to cite

title={Polishing the gold--how much revision do we need in treebanks?},
author={de{ }Souza, Elvis and Freitas, Cl{\'a}udia},
booktitle={Procedings of the Universal Dependencies Brazilian Festival},


Statistics of UD Portuguese PetroGold

POS Tags






Tokenization and Word Segmentation



Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features


Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Reflexive Passive

Relations Overview