Florent Desnous – Laboratoire d'Informatique de l'Université du Mans

The aim of this thesis is to develop variable context speaker models (scalable) that integrate phonetic information produced by the speaker. These models will be learned on a significant amount of enrollment data (> 30s) and will adapt to the test data to ensure the best possible comparison based on the phonetic context recognized in the test sample. These models will improve the performances of recognition systems and broaden the application framework of speaker recognition.

Acoustic modeling for short duration samples has an immediate relevance for speaker segmentation. The developed models will also be evaluated for this task.

Such models could be extended to take into account different acoustic environments, spoken languages or model the same speaker in different contexts of vocal production (e.g. Lombard effect or whispering).

Variable context modeling for speaker recognition