This document is a placeholder for the language-specific overview of guidelines for part-of-speech tags and features.
Token level morphology features and lemmas have been added automatically using the parsers/taggers in Bohnet et al. 2013* trained on the Ancora** treebank and converted automatically to UD standards.
Various heuristics have been added to improve the output of the tagger, fix obvious errors and add features that the tagger did not supply. The changes for v1.2 (November 2015) were done by Miguel Ballesteros, Dan Zeman, and Héctor Martínez Alonso.
The Spanish UD conforms to the UD guidelines, but there are some exceptions.
- Bohnet, Bernd, et al. “Joint morphological and syntactic analysis for richly inflected languages” Transactions of the Association for Computational Linguistics 1 (2013): 415-428.
- Taulé, Mariona, Maria Antònia Martí, and Marta Recasens. “AnCora: Multilevel Annotated Corpora for Catalan and Spanish.” LREC. 2008.