Machine Translation - Resources


  • Moses – A statistical machine translation system
  • IRSTLM – A toolkit featuring algorithms and data structures to store and access very large n-gram language models
  • MGIZA++ – An extension of MGIZA++, which allows to align sentence pair in an online mode
  • AQET – Adaptive Quality Estimation tool for Machine Translation
  • ModernMT – A neural adaptive machine translation system that adapts to context and learns from corrections


  • CLTE Benchmark – Cross-Lingual Textual Entailment Dataset
  • RTE3-derived CLTE dataset – A cross-lingual entailment corpus, obtained by translating the RTE-3 dataset
  • BinQE – A Machine Translation Dataset Annotated with Binary Quality Judgements
  • TOSCA-MP Speech Ground Truth – A multilingual dataset of news and talk show transcriptions and translations
  • BitterCorpus – En-Ita corpus with annotated bilingual terms in IT domain
  • WIT3 – A ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks
  • WAGS – English-Italian Word Alignment Gold Standard
  • eSCAPE – a Large-scale Synthetic Corpus for Automatic Post-Editing