Seminar from Kévin Vythelingum, Voxygen

 

Date: 02/10/2020
Time: 11h00
Localisation: IC2, boardroom
Speaker: Kévin Vythelingum
 

 

Recent advances in neural text-to-speech led to systems with human-level performances in speech synthesis. However, in single-speaker synthesis, we still need large amounts of data from a voice talent to reproduce his voice. Thus, a new model should be trained for each new voice to be developed. A way to reduce the amount of new data is to use existing records of other speakers. We will present a multi-speaker text-to-speech model and we will discuss its generalization capabilities to unseen speakers.