PhD defence, Kevin Vythelingum

Date : 10/12/2019
Heure : 13h30
Lieu : Board room, IC2 building, LIUM, Le Mans Université

Title : Rapid, efficient and joint construction of speech recognition and synthesis systems for new languages

Jury members :
– Martine ADDA-DECKER (LPP, Université Paris 3 Sorbonne)
– Sylvain MEIGNER (LIUM, Le Mans Université)

– Jean-François BONASTRE (LIA, Université d’Avignon)
– Damien LOLIVE (IRISA, Enssat Lannion)
Supervisor: M. Yannick Estève (LIA, Université d’Avignon)
Co-supervisors: :
– Olivier ROSEC (Voxygen)

– Anthony LARCHER (LIUM, Le Mans Université)

Abstract :

We study in this thesis the joint construction of speech recognition and synthesis systems for new languages, with the goals of accuracy and quick development. The rapid development of voice technologies for new languages is driving scientific ambitions and is now considered strategic by industial players. However, language development research is led by a few research centers, each working on a limited number of languages. However, these technologies share many common points. Our study focuses on building and sharing tools between systems for creating lexicons, learning phonetic rules and taking advantage of imperfect data.
Our contributions focus on the selection of relevant data for learning acoustic models, the joint development of phonetizers and pronunciation lexicons for speech recognition and synthesis, and the use of neural models for phonetic transcription from text and speech signal. In addition, we present an approach for automatic detection of phonetic transcript errors in annotated speech signal databases. This study has shown that it is possible to significantly reduce the quantity of data annotation useful for the development of new text-to-speech systems. It naturally helps to reduce data collection time in the process of new systems creation. Finally, we study an application case by jointly building a system for recognizing and synthesizing speech for a new language.

Keywords :
Automatic speech recognition, Text-to-speech synthesis, Phonemic transcription, Automatic error detection, New languages development, Imperfect data exploitation