PhD defence, Martin Lebourdais

Date : 17/10/2023
Time : 9h00
Location : Le Mans Université; IC2 buiding Auditorium
 

Title :Speaker interactions : from overlapped speech to interruption detection.
 

Jury members :

  • Romain Serizel, MCF HDR, Loria, Reviewer
  • Ricard Marxer, Professor, LIS, Reviewer
  • Martine Adda-Decker Research Director, CNRS, Examiner
  • Hervé Bredin Chargé de recherche, CNRS, Examiner
  • Slim Essid, Professor, Telecom Paris, Examiner
  • Laetitia Biscarrat, Maîtresse de conférence, LERASS, Invited
  • Marie Tahon, Professor, Le Mans Université LIUM, Supervisor
  • Antoine Laurent, Professor, Le Mans Université LIUM, Co-Supervisor
  • Sylvain Meignier Professor, Le Mans Université LIUM, Director of thesis

 

Abstract:

The ANR GEM project, initiated by the National Audiovisual Institute, aims to study the differences in treatment and representation between women and men in the media. This project encourages collaboration between research in media and language sciences and research in computer science. One of the project’s objectives is to promote the creation of automated tools to generalize and facilitate social sciences and humanities studies on large corpora.

In this thesis, we will focus on signal processing tools that facilitate the characterization of speaker representations. Specifically, we propose methods to automatically detect and characterize interruptions during conversations from television debate programs.

Interruption is a subjective concept with no consensus on its definition. In our field of automatic processing, this task is new, lacks a framework, and has limited resources. We propose, initially, to narrow down the definition of interruptions to the specific case of overlapped speech, following the literature in sociology and language sciences. A tool for detecting the presence of single and multiple speakers’ vocal activity has been developed in this context. Developing such a tool raises questions beyond quantitative evaluation. Several studies have been conducted on the duration and linguistic content of the multi-speaker segments.

Subsequently, we specifically focused on interruption detection. Training dedicated neural models required the collection and annotation of a corpus. By guiding the annotators, we arrived at an example-based definition of interruption. Creating such a corpus enabled the development of a binary interruption classification model to characterize the previously detected multi-speaker segments.

 

Keywords:

Overlapped speech, Speech processing, Artificial intelligence, Interruption.