Machine Translation - Resources

Software

  • Moses – A statistical machine translation system
  • IRSTLM – A toolkit featuring algorithms and data structures to store and access very large n-gram language models
    online
  • MGIZA++ – An extension of MGIZA++, which allows to align sentence pair in an online mode
  • AQET – Adaptive Quality Estimation tool for Machine Translation
  • ModernMT – A neural adaptive machine translation system that adapts to context and learns from corrections

Corpora

  • CLTE Benchmark – Cross-Lingual Textual Entailment Dataset
  • RTE3-derived CLTE dataset – A cross-lingual entailment corpus, obtained by translating the RTE-3 dataset
  • BinQE – A Machine Translation Dataset Annotated with Binary Quality Judgements
  • TOSCA-MP Speech Ground Truth – A multilingual dataset of news and talk show transcriptions and translations
  • BitterCorpus – En-Ita corpus with annotated bilingual terms in IT domain
  • WIT3 – A ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks
  • WAGS – English-Italian Word Alignment Gold Standard