Martin Lebourdais – Laboratoire d'Informatique de l'Université du Mans

PhD defence, Martin Lebourdais

Date : 17/10/2023
Time : 9h00
Location : Le Mans Université; IC2 buiding Auditorium

Title :Speaker interactions : from overlapped speech to interruption detection.

Jury members :

Romain Serizel, MCF HDR, Loria, Reviewer
Ricard Marxer, Professor, LIS, Reviewer
Martine Adda-Decker Research Director, CNRS, Examiner
Hervé Bredin Chargé de recherche, CNRS, Examiner
Slim Essid, Professor, Telecom Paris, Examiner
Laetitia Biscarrat, Maîtresse de conférence, LERASS, Invited
Marie Tahon, Professor, Le Mans Université LIUM, Supervisor
Antoine Laurent, Professor, Le Mans Université LIUM, Co-Supervisor
Sylvain Meignier Professor, Le Mans Université LIUM, Director of thesis

Abstract:

The ANR GEM project, initiated by the National Audiovisual Institute, aims to study the differences in treatment and representation between women and men in the media. This project encourages collaboration between research in media and language sciences and research in computer science. One of the project’s objectives is to promote the creation of automated tools to generalize and facilitate social sciences and humanities studies on large corpora.

In this thesis, we will focus on signal processing tools that facilitate the characterization of speaker representations. Specifically, we propose methods to automatically detect and characterize interruptions during conversations from television debate programs.

Interruption is a subjective concept with no consensus on its definition. In our field of automatic processing, this task is new, lacks a framework, and has limited resources. We propose, initially, to narrow down the definition of interruptions to the specific case of overlapped speech, following the literature in sociology and language sciences. A tool for detecting the presence of single and multiple speakers’ vocal activity has been developed in this context. Developing such a tool raises questions beyond quantitative evaluation. Several studies have been conducted on the duration and linguistic content of the multi-speaker segments.

Subsequently, we specifically focused on interruption detection. Training dedicated neural models required the collection and annotation of a corpus. By guiding the annotators, we arrived at an example-based definition of interruption. Creating such a corpus enabled the development of a binary interruption classification model to characterize the previously detected multi-speaker segments.

Keywords:

Overlapped speech, Speech processing, Artificial intelligence, Interruption.