UD Catalan AnCora

Language: Catalan (code: ca)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v1.3 release.

The following people have contributed to making this treebank part of UD: Héctor Martínez Alonso, Elena Pascual, Daniel Zeman.

License: CC BY 4.0

Genre: news

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS not available
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD


Catalan data from the AnCora corpus.

The original annotation was done in a constituency framework as a part of the AnCora project at the University of Barcelona. It was converted to dependencies and used in the CoNLL 2009 shared task. The CoNLL 2009 version was later converted to HamleDT and to Universal Dependencies.

The GNU license is inherited from the original dataset, downloaded from the AnCora website. Any license-related questions have to be directed to the original data providers at the University of Barcelona (that is, not to the UD contact address listed at the end of this README file).


The following paper must be cited when using this corpus:

In addition, the following paper must be cited if coreference information (attributes entity, coreftype, corefsubtype, homophoricDD or entityref) is used:

Additionally, the following paper must be cited when argumental attributes in “sn” or “grup.nom” (attributes func, arg, tem or lexicalid) are used:

Statistics of UD Catalan AnCora

