PhD defence; Script optimization for TTS voice corpus design in audio-book generation


PhD candidate: Meysam Shamsi
Date: 16/10/2020
Time: 14h00
Localization: IC2, Boardroom
Supervisor: Damien Lolive (Senior lecturer, ENSSAT)

Mr. Meysam SHAMSI will defend his PhD thesis in computer science carried out at ENSSAT under the supervision of Damien Lolive (Senior Lecturer, HDR – thesis director).


The objective of this thesis is the generation of a high quality expressive audio-book, using natural and synthetic speech signals with a minimal recording cost.

The strategy consists on selecting a part of the book and recording its reading to build a voice corpus. This voice is then used for synthesizing the rest of the book using a Text-to-Speech system. Several strategies are successively proposed: a posterior approach using voice reduction methods, a neural network based (CNN) auto-encoder focusing on linguistic information, and then the selection of the shortest utterances. These different approaches are objectively and perceptually evaluated.

Finally, the quality of audio-book mixing natural and synthetic speech signals is evaluated. The evaluations show the mixture of synthetic and natural signals is preferred than fully synthetic signals produced by a unit selection based TTS system.

Keywords : script selection, expressive audio-book generation, voice reduction, synthetic speech quality evaluation, hybrid TTS systems, linguistic and acoustic embeddings