Alessio Brutti - Research Interests

In this page you can find more details on my current and former research activities on audio signal processing and multimedia processing.

    • Speaker identification and verification:
      I addressed the speaker recognition problem targeting in particular reverberant distant speech. In particular, model adaptation and effective combinations of multiple distributed microphones mitigate the effects of reverberation. Currently, I am focusing on speaker diarization, also on telephone speech, implementing the most advanced state-of-the-art approaches based on deep learning (i.e. speaker embeddings).
    • Audio-Video Person Identification:
      Recently, in collaboration with Queen Mary University London I have been investigating the person recognition problem with multi-modal (audio and video) sensors for person-centered scenarios (i.e. using wearable devices). Specifically, the main focus was on unsupervised on-line adaptation of the target models.
      Visit the page of the audio-visual processing joint project with QMUL for more details.
    • Audio-Video people tracking:
      Initially, this activity was conducted in cooperation with Oswald Lanz of the TEV research unit. The goal was to track the positions and head poses of multiple subjects in an environment equipped with multiple distributed microphones and cameras.
      We implemented a generative Bayesian framework which combines audio and video information at likelihood level, substantially improving the robustness of single modalities.
      A couple of video-clips are available in our youtube channel:
      Currently, in collaboration with Queen Mary University London, we are investigating similar paradigms to achieve 3D localization of multiple targets using 1 single camera co-located with a compact microphone array.
      —> CAV3D DATASET is now available for download. <—
    • Speech and audio digital signal processing:
      covering a large variety of topics, in particular: activity detection, speech enhancement for ASR, event classification
    • Localization of acoustic sources in multi-microphone environments:
      Compact microphone arrays and distributed microphone networks.
      Sound source localization and tracking: acoustic maps as the Global Coherence Field (GCF), multi-source scenarios and generative bayesian approaches.
      Estimation of source orientation based on Oriented Global Coherence Field (OGCF).
      BSS based tracking of multiple sources.
      Environment aware processing: position and orientation estimation and characterization of the emission patter.
      Check out some demos from our youtube channel:


    Past Projects

    • DIRHA: development of the multi-room, multi-microphone front-end. demonstrator
    • SCENIC: environment aware localization of multiple sources and estimation of the source emission pattern
    • DICIT: source localization

    Other Activities

    • Visiting researcher at Queen Mary University London during summer 2015
    • PhD committee at Vrije Universiteit Brussel
    • PhD committee at Tampere University of Technology
    • PhD committee at University of Alcala’