eSCAPE is the largest freely-available Synthetic Corpus for Automatic Post-Editing. It consists of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly-available parallel corpora and using the target side as an artificial human post-edit. Translations are obtained both with phrase-based and neural models.

For each MT paradigm, eSCAPE contains 7.2 million triplets for English–German and 3.3 million for English–Italian, resulting in a total of 14,4 and 6,6 million instances respectively.  In addition in version 2, it contains also an English-Russian section including 7.7 million triplets.

If you use the corpus, please cite the above paper.

How to obtain eSCAPE:


