Seminar from Kévin Vythelingum, Voxygen
Localisation: IC2, boardroom
Speaker: Kévin Vythelingum
Recent advances in neural text-to-speech led to systems with human-level performances in speech synthesis. However, in single-speaker synthesis, we still need large amounts of data from a voice talent to reproduce his voice. Thus, a new model should be trained for each new voice to be developed. A way to reduce the amount of new data is to use existing records of other speakers. We will present a multi-speaker text-to-speech model and we will discuss its generalization capabilities to unseen speakers.