Seminar from Aghilas Sini (LIUM)
Date: 19/01/2024
Time: 10h15
Localization: IC2, Boardroom
Speaker: Aghilas Sini
When neural speech technologies encounter non-conventional data: A discussion on speech recognition and speech synthesis.
Most neural speech technologies are developed using dedicated data recorded under favorable acoustic conditions. This data is instrumental in setting up the underlying models and facilitates a fair and straightforward comparison, enabling the establishment of benchmarks. However, it is intriguing to analyze and quantify the ability of neural speech systems to leverage non-dedicated data and real-world conditions, whether during the learning or inference stages.
To address these questions and related issues, I will discuss the impact of non-conventional data on state-of-the-art speech technology through two specific practical examples: pronunciation assessment of children’s speech in a noisy classroom and the development of a fair speech synthesis system for the French language using amateur recording data. I will then explore two speech synthesis techniques, namely voice conversion and voice cloning, to investigate speaker identity and assess data quality.
Furthermore, I will share ongoing and future work related to multimodal and multilingual data, particularly in the context of deep-fake speech detection and speech-to-speech translation. In conclusion, I will present reflections on the data qualification process, aiming to estimate and anticipate the performance of a given system.