Alessio Brutti - Research Interests
In this page you can find more details on my current and former research activities on audio signal processing.
- Localization of acoustic sources in multi-microphone environments:
Compact microphone arrays and distributed microphone networks.
Sound source localization and tracking: acoustic maps as the Global Coherence Field (GCF), multi-source scenarios and generative bayesian approaches.
Estimation of source orientation based on Oriented Global Coherence Field (OGCF).
BSS based tracking of multiple sources.
Environment aware processing: position and orientation estimation and characterization of the emission patter.
Check out some demos from our youtube channel:
- Speaker identification and verification:
I addressed the speaker recognition problem targeting in particular reverberant distant speech. Effects of reverberation is mitigated through model adaptation and by combining multiple distributed microphone. Currently, I am working on speaker diarization, also on telephone speech, implementing the most advanced state-of-the-art approaches based on deep learning (i.e. speaker embedding, speaker2vec).
- Audio-Video people tracking:
This activity was conducted in cooperation with the TEV research unit. To goal was to track the positions and head poses of multiple subjects in an environment equipped with multiple distribued microphones and cameras.
Audio and Video information is combined at likelihood level in a generative Bayesian framework, substantially improving the robustness of single modalities.
A couple of video-clips are available in our youtube channel:
Currently, in collaboration with Queen Mary University London, we are investigating similar paradigms to achieve 3D localization of multiple targets using 1 single camera co-located with a compact microphone array.
CAV3D DATASET is now available for download
- Audio-Video Person Identification:
Recently, in collaboration with Queen Mary University London I have been investigating the person recognition problem with multi-modal (audio and video) sensors for person-centered scenarios (i.e. using wearable devices). The main focus was on unsupervised on-line adaptation of the target models. seminar.
- Speech and audio digital signal processing:
covering a large variety of topics, in particular: activity detection, speech enhancement for ASR, event classification
- DIRHA: development of the multi-room, multi-microphone front-end. demonstrator
- SCENIC: environment aware localization of multiple sources and estimation of the source emission pattern
- DICIT: source localization
- Visiting researcher at Queen Mary University London during summer 2015
- PhD committee at Vrije Universiteit Brussel
- PhD committee at Tampere University of Technology
- PhD committee at University of Alcala’