- CAV3D (Co-located Audio-Visual streams with 3D tracks) dataset here.
- TLT-school: a corpus of non native children speech. Click here to download.
- DIRHA data: dataset collected during the DIRHA project, details here.
- A matlab implementation of AV3T, a tool for audio-visual tracking of multiple targets is available here: AV3T matlab code
- ConflictNet: An end-to-end CNN-LSTM architecture with attention mechanism that estimates the level of verbal conflict from raw speech signals (git repo with code for SSPNet Conflict Corpus)
- On-line supervised speaker diarization using an extension of the UIS-RNN based on the use of the sample mean loss: git repo
Check my video presentation: https://www.youtube.com/watch?v=N6fpRrt1lgo&t=11s