The SpeechTeK research unit deals with automatic speech recognition (ASR) technologies, automatic speaker recognition and integration of multi-modal (audio and video) sources of information. Application areas are: automatic transcription of audio streams, development of human-machine interaction systems, development of tools to help language learning (especially a second language), development of systems for biometric identification or verification.

Contributions to Smart Cities and Communities

SpeechTeK research unit contributes to the objectives of smart cities and communities mainly in the context of: open government, school and city sensing. Automatic transcription of spoken contents, possibly tuned to specific needs of the users, can effectively improve the quality of services furnished to citizens reducing, at the same time, their related costs. The research carried out in SpeechTeK has also an important application in education, particularly through the development of tools both to help language learning and for automatic evaluation of the proficiency reached by students. The competences on multimodality, i.e. the integration of information coming from multiple sensors, are useful in the context of city sensing, particularly for surveillance and security applications both indoor and outdoor.

Contributions to Artificial Intelligence

Research on ASR mainly focus on acoustic modeling, specifically with the aim of developing multilingual models and improving the performance on children speech. In the area of language learning, methods are investigated for automatically estimating the proficiency, mainly in second language acquisition, of students at primary, secondary and high school. Research on multimodality addresses techniques for both learning and interpolating multimodal probability distributions. Approaches based on Deep Neural Networks (DNNs) and related algorithms will be at the basis of the research in all application areas.

Head of Unit




  • 1
  • 2
  • 7