PhD defence, Valentin Pelloin

Date : 24/01/2024
Time : 9h30
Location : Le Mans Université; IC2 buiding Auditorium

Title: Spoken language understanding in human-computer dialogue systems in the era of pre-trained models

Jury members :

  • Christophe CERISARA, Researcher, LORIA, Nancy, Reviewer
  • Benoit FAVRE, Professor, LIS, Marseille, Reviewer
  • Géraldine DAMNATI, Research Engineer, Orange Labs, Lannion Examiner
  • Richard DUFOUR, Assistant Professor, LIA, Avignon, Examiner
  • Sophie ROSSET, Research Director, LISN, Orsay Examiner
  • Sylvain MEIGNIER, Professor, Le Mans Université LIUM, Director of thesis
  • Nathalie CAMELIN, Assistant Professor, Le Mans Université LIUM, Supervisor
  • Antoine LAURENT, Professor, Le Mans Université LIUM, Supervisor



In this thesis, spoken language understanding (SLU) is studied in the application context of telephone dialogues with defined goals (hotel booking reservations, for example).

Historically, SLU was performed through a cascade of systems: a first system would transcribe the speech into words, and a natural language understanding system would link those words to a semantic annotation. The development of deep neural methods has led to the emergence of end-to-end architectures, where the understanding task is performed by a single system, applied directly to the speech signal to extract the semantic annotation.

Recently, so-called self-supervised learning (SSL) pre-trained models have brought new advances in natural language processing (NLP). Learned in a generic way on very large datasets, they can then be adapted for other applications. To date, the best SLU results have been obtained with pipeline systems incorporating SSL models.

However, none of the architectures, pipeline or end-to-end, is perfect. In this thesis, we study these architectures and propose hybrid versions that attempt to benefit from the advantages of each. After developing a state-of-the-art end-to-end SLU model, we evaluated different hybrid strategies. The advances made by SSL models during the course of this thesis led us to integrate them into our hybrid architecture.



spoken language understanding, automatic speech recognition, neural networks, pre-trained models, self-supervised models, semantic concepts extraction