The SpeechTeK research unit deals with automatic speech recognition (ASR) technologies, automatic speaker recognition and integration of multi-modal (audio and video) sources of information. Application areas are: automatic transcription of audio streams, development of human-machine interaction systems, development of tools to help language learning (especially a second language), development of systems for biometric identification or verification.
Contributions to Smart Cities and Communities
SpeechTeK research unit contributes to the objectives of smart cities and communities mainly in the context of: open government, school and city sensing. Automatic transcription of spoken contents, possibly tuned to specific needs of the users, can effectively improve the quality of services furnished to citizens reducing, at the same time, their related costs. The research carried out in SpeechTeK has also an important application in education, particularly through the development of tools both to help language learning and for automatic evaluation of the proficiency reached by students. The competences on multimodality, i.e. the integration of information coming from multiple sensors, are useful in the context of city sensing, particularly for surveillance and security applications both indoor and outdoor.
Contributions to Artificial Intelligence
Research on ASR mainly focus on acoustic modeling, specifically with the aim of developing multilingual models and improving the performance on children speech. In the area of language learning, methods are investigated for automatically estimating the proficiency, mainly in second language acquisition, of students at primary, secondary and high school. Research on multimodality addresses techniques for both learning and interpolating multimodal probability distributions. Approaches based on Deep Neural Networks (DNNs) and related algorithms will be at the basis of the research in all application areas.
Key projects and results
- EIT CONVERSATIONAL BANKING – The project aims to develop conversational agents interacting with users, by both voice and textual messages, asking financial information. To this purpose SpeechTeK unit will develop ASR systems, both in English and Hungarian languages, capable of handling language models that are dynamically activated in a context of human-machine dialog.
- IPRASE – The goal of this project is to automatically estimate the language proficiency of Italian native-language students in Trentino region during two-year evaluation campaigns, both in English and German language.
- PERVOICE-LAN – This project, funded by the Italian company PerVoice (www.pervoice.com), has among its objective the development of ASR systems for transcribing TV news in: Russian, Chinese and Japanese languages.
- PERVOICE-SD – This project, partially funded by PerVoice, aims to develop a module for speaker diarization based on the usage of DNN embedded representations of speaker identities.
- AUDIO VISUAL SCENE ANALYSIS – This project is part of a wider cooperation between FBK-ICT and the Centre for Intelligent Sensing of Queen Mary University London. It consists of 2 joint PhD grants aiming at developing advanced solutions for audio-visual person identification and scene analysis using heterogeneous devices in challenging unconstrained environments.