Seminar from Théo Mariotte, lecturer at LIUM

 

Date: 24/02/2025
Time: 10h30
Place: IC2, Boardroom
Speaker: Théo Mariotte
 
 

Towards interpretable representations for audio and speech processing

 

This seminar is divided into two main parts. The first part reviews my previous research, while the second explores future research directions.

In the first section, I will briefly introduce the methods developed during my thesis before delving deeper into my postdoctoral work. Specifically, I will present Annealed Multiple Choice Learning, a general training framework with applications to source separation. This method trains multiple hypotheses to handle ambiguous tasks effectively. Additionally, I will discuss the application of neural clustering for jointly performing source separation and speaker diarization in long-form meeting recordings.

The second part of the seminar will focus on speaker segmentation in the multi-microphone scenario. The proposed method (WIP) combines spatial filtering, source localization, and voice activity detection to predict speaker activity. This approach aims to be more interpretable and requires fewer trainable parameters. I will also discuss the challenges of simulating training data and share my struggling. Finally, I will introduce other research directions, including disentangled self-supervised representation learning and large-scale source separation.