Offre de thèse: Apprentissage actif, interprétation et contrôle pour la synthèse neuronale de parole expressive – Laboratoire d'Informatique de l'Université du Mans

Open PhD position: Active learning, interpretation and control for neural synthesis of expressive speech

Host Laboratory : Laboratoire Informatique de l’Université du Mans (LIUM), team LST
PhD Director : Anthony Larcher, (LIUM, Le Mans Université)
Co-supervisors : Yannick Estève (LIA, Avignon Université), Marie Tahon (LIUM, Le Mans Université)
Contact : firstname.name@univ-lemans.fr and firstname.name@univ-avignon.fr, respectively
starting date : September 2021
Deadline : 10 July 2021

Project context:
The thesis will take place at the Laboratoire d’Informatique de l’Université du Mans (LIUM) in the LST (Language and Speech Technology) team and at the Laboratoire d’Informatique d’Avignon (LIA). The candidate will be based in Le Mans and stays in Avignon will be planned regularly. The LIA is a partner of the European project SELMA1 which aims at producing a technological platform able to process massive and continuous streams of video documents in several languages for broadcasting purposes. A very exploratory part of this European project aims at developing an expressive speech generation tool for audio broadcasting of audio documents in target languages.

Candidate profile:
The candidate should be motivated to work on written and spoken language, and show an interest in speech synthesis. He/she should have a Master’s degree in Computer Science, and experience in machine learning would be appreciated.

Objectives:
The main objective of the project is to propose, develop and validate methods that allow 1) to generate expressive speech from a user-given instruction using either text-to-speech systems or voice conversion; and 2) to interact with the system during learning and inference to correct the system’s audio outputs. First, we will study the visualization and interpretation of latent representations learned by a state-of-the-art neural model (Tacotron + WaveNet) in terms of prosody, speaker, expressiveness and pronunciation.
It will be necessary to define user control elements like annotations that can be integrated into the learning corpus using techniques such as acoustic parameter adaptation, embeddings, attention mechanisms, or intermediate model learning. In parallel, neural architectures compatible with active learning (model reinforcement or domain adaptation) will be proposed, and the most relevant strategies for active learning will be determined. Finally, an important part of the work will consist in evaluating the synthesis produced, in a context of audio books or journalistic content

Application: CV + cover letter should be sent before 10 July to

Anthony Larcher (firstname.name@univ-lemans.fr) – LIUM, Le Mans Université
Yannick Estève (firstname.name@univ-avignon.fr) – LIA, Avignon Université
Marie Tahon (firstname.name@univ-lemans.fr) – LIUM, Le Mans Université