PhD defense, Natalia TOMASHENKO

 

Title of PhD defense : Speaker adaptation of deep neural network acoustic models using Gaussian mixture model framework in automatic speech recognition systems.

Composition of the jury :

  • Reviewers :
    1. Jean-François BONASTRE, Professeur, Université d’Avignon et des Pays de Vaucluse
    2. Denis JOUVET, Directeur de Recherche, LORIA-INRIA/li>
  • Examiners :
    1. JAlexey KARPOV, Professeur, ITMO University
    2. DLori LAMEL, Directrice de Recherche, LIMSI-CNRS
  • Advisor : Yannick ESTEVE, Profesor, Le Mans Université
  • Co-advisor : Yuri MATVEEV, Profesor, ITMO University
  • Co-advisor : Anthony LARCHER, Lecturer, Le Mans Université

Abstract :

Differences between training and testing conditions may significantly degrade recognition accuracy in automatic speech recognition (ASR) systems. Adaptation is an efficient way to reduce the mismatch between models and data from a particular speaker or channel. There are two dominant types of acoustic models (AMs) used in ASR: Gaussian mixture models (GMMs) and deep neural networks (DNNs). The GMM hidden Markov model (GMM-HMM) approach has been one of the most common technique in ASR systems for many decades. Speaker adaptation is very effective for these AMs and various adaptation techniques have been developed for them. On the other hand, DNN-HMM AMs have recently achieved big advances and outperformed GMM-HMM models for various ASR tasks. However, speaker adaptation is still very challenging for these AMs. Many adaptation algorithms that work well for GMMs systems cannot be easily applied to DNNs because of the different nature of these models. The main purpose of this thesis is to develop a method for efficient transfer of adaptation algorithms from the GMM framework to DNN models. A novel approach for speaker adaptation of DNN AMs is proposed and investigated. The idea of this approach is based on using so-called GMM-derived features as input to a DNN. This technique of processing features for DNNs makes it possible to use GMM adaptation algorithms for neural network AMs. The proposed technique provides a general framework for transferring adaptation algorithms, developed for GMMs, to DNN adaptation. It is explored for various state-of-the-art ASR systems and is shown to be effective in comparison with other speaker adaptation techniques and complementary to them.

Key words:

speaker adaptation, speaker adaptive training, deep neural network (DNN), Gaussian mixture model (GMM), GMM-derived features, automatic speech recognition (ASR), acoustic models, deep learning.