Robustness of speaker recognition systems: additive noise, reverberation, and linguistic content variabilities – Laboratoire d'Informatique de l'Université du Mans

Séminaire de Aran Mohammadamini, Post-doctorant au LIUM

Date: 05/04/2024
Heure : 10h15
Lieu : IC2, Salle des conseils
Intervenant : Mohammad Mohammadamini

Robustness of speaker recognition systems: additive noise, reverberation, and linguistic content variabilities

Speaker recognition systems authenticate the identity of speakers from their speech utterances. In order to authenticate the identity of a claimed user, it is required to obtain a fixed-length compact speaker-discriminant representation for variable-length speech utterances known as speaker embeddings. The current speaker recognition systems are using DNNs to extract speaker embeddings. Despite the relative robustness of DNN-based speaker recognition systems, their performance degrades in the presence of acoustical variabilities such as additive noise and reverberation.

There are three main groups of variabilities that reduce the performance of speaker recognition systems: internal (e.g. age, emotion, and stress), external (e.g. noise, reverberation, and distance), and content (e.g. language, and accent). The main theme of this presentation is robustness of DNN-based text-independent speaker recognition systems against additive noise and reverberation variabilities. The impact of these variabilities can be addressed at the signal level, feature level, speaker embedding extractor, speaker embedding, and scoring adaptation techniques. In this seminar, I will discuss a part of my previous work at signal level, speaker embedding extractor and speaker embeddings levels to make speaker recognition systems robust against additive noise and reverberation.

The last part of my presentation will concentrate on linguistic content variability in speaker recognition systems. In this part, I discuss a phrase and language-independent utterance verification system. The objective of an utterance verification system is to confirm whether the linguistic content of two speech utterances is the same or not. A common application for an utterance verification system is speaker recognition systems. Utterance verification is used in text-dependent speaker recognition systems in two ways. Firstly it can be used to filter out the trials containing unacceptable linguistic content. By doing so, the linguistic variability can be controlled which leads to higher performance in severe acoustical conditions. Additionally, an utterance verification system serves as a supplementary metric alongside speaker characteristics, increasing the robustness of speaker recognition systems.