Bug in text files (sent 2017-03-13 21:11 CET)

Dear shared task participants,

we have found a bug in the script used to extract raw text from the CoNLL-U files (thanks Tianze Shi for reporting the bug!) It means that a few .txt files from the UD 2.0 release contain errors. The main files in the CoNLL-U format are not affected by the bug. You can download fixed data from the following URLs:

  • http://ufal.mff.cuni.cz/~zeman/soubory/ud-treebanks-conll2017.tgz
  • http://ufal.mff.cuni.cz/~zeman/soubory/ud-treebanks-v2.0.tgz
  • The fixed script can be downloaded from https://github.com/UniversalDependencies/tools/blob/master/conllu_to_text.pl

Sorry for the inconvenience.

Best regards,

Dan Zeman

on behalf of the costocom :) (connl shared task organizing committee) http://universaldependencies.org/conll17/