Seminar from Mathieu Fontaine, researcher at Télécom Paris
Date: 27/11/2023
Time: 14h00
Localization: Saint Denis D’Orque
Speaker: Mathieu Fontaine
Overview of speech-related topics at ADASP (Télécom Paris) and overseas
The first project, with Thomas SERRE (PhD Student), focuses on personalized speech enhancement. We aim to isolate a target speaker from a mixture of their speech, interference speech, and background noise. While TEA-PSE3.0, a dual-stage personalized speech enhancement system, performs exceptionally well, it is complex and not easily adapted for lightweight devices. We propose adapting a state-of-the-art lightweight dual-stage speech enhancement system (DeepFilterNet2) for personalized speech enhancement. Our results show improved performance compared to the original DeepFilterNet2, although it falls short of TEA-PSE3.0, despite having significantly fewer parameters.
The second project, with Elio GRUTTADAURIA (PhD Student), focuses on online speech separation-guided speaker diarization for meeting conversations. Leveraging insights from recent online approaches, we use a front-end speech separation system to provide speech activity for updating clustering in speaker diarization. Our approach outperforms the state-of-the-art, particularly in overlap segments, and we conduct ablation studies to determine optimal source separation algorithms.
The third project, conducted with the SSU team, addresses improving Automatic Speech Recognition (ASR) systems in real conditions using an augmented reality headset. Our algorithm includes a back-end stage with an interpretable speaker separation system (FastMNMF) and a front-end stage using a DNN beamformer for refinement. This combined approach provides a reliable source for the ASR system, crucial in real noisy conditions where both the headset user (wearing Hololens 2) and speakers are in motion. The system is currently under development and is expected to be available around 2024 or 2025.