eSCAPE

eSCAPE is the largest freely-available Synthetic Corpus for Automatic Post-Editing. It consists of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly-available parallel corpora and using the target side as an artificial human post-edit. Translations are obtained both with phrase-based and neural models.

For each MT paradigm, eSCAPE contains 7.2 million triplets for English–German and 3.3 million for English–Italian, resulting in a total of 14,4 and 6,6 million instances respectively.  In addition in version 2, it contains also an English-Russian section including 7.7 million triplets.

If you use the corpus, please cite the above paper.

@inproceedings{negri-etal-2018-escape,
title = “{ESCAPE}: a Large-scale Synthetic Corpus for Automatic Post-Editing”,
author = “Negri, Matteo and
Turchi, Marco and
Chatterjee, Rajen and
Bertoldi, Nicola”,
booktitle = “Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)”,
month = may,
year = “2018”,
address = “Miyazaki, Japan”,
publisher = “European Language Resources Association (ELRA)”,
url = “https://www.aclweb.org/anthology/L18-1004”,
}

How to obtain eSCAPE:

 

Contact us:  turchi[at]fbk.eu