End-to-end Spoken Language Translation in Rich Data Conditions
The goal of this project is to enhance the state of the art in end-to-end SLT focusing on two orthogonal dimensions: architecture and data. On the architecture side, new solutions will be deployed to improve translation quality by supporting SLT encoders with better input representations, fully exploiting speech prosodic cues, factorising signal and content information and meeting external stylistic constraints. On the data side, the focus will be on the exploitation of large, multilingual data featuring high speaker and content diversity. The proposed solutions will be evaluated on several translation directions using MuST-C, the largest corpus currently available for SLT research.