Valentin Pelloin – Laboratoire d'Informatique de l'Université du Mans

PhD defence, Valentin Pelloin

Date : 24/01/2024
Time : 9h30
Location : Le Mans Université; IC2 buiding Auditorium

Title: Spoken language understanding in human-computer dialogue systems in the era of pre-trained models

Jury members :

Christophe CERISARA, Researcher, LORIA, Nancy, Reviewer
Benoit FAVRE, Professor, LIS, Marseille, Reviewer
Géraldine DAMNATI, Research Engineer, Orange Labs, Lannion Examiner
Richard DUFOUR, Assistant Professor, LIA, Avignon, Examiner
Sophie ROSSET, Research Director, LISN, Orsay Examiner
Sylvain MEIGNIER, Professor, Le Mans Université LIUM, Director of thesis
Nathalie CAMELIN, Assistant Professor, Le Mans Université LIUM, Supervisor
Antoine LAURENT, Professor, Le Mans Université LIUM, Supervisor

Abstract:

In this thesis, spoken language understanding (SLU) is studied in the application context of telephone dialogues with defined goals (hotel booking reservations, for example).

Historically, SLU was performed through a cascade of systems: a first system would transcribe the speech into words, and a natural language understanding system would link those words to a semantic annotation. The development of deep neural methods has led to the emergence of end-to-end architectures, where the understanding task is performed by a single system, applied directly to the speech signal to extract the semantic annotation.

Recently, so-called self-supervised learning (SSL) pre-trained models have brought new advances in natural language processing (NLP). Learned in a generic way on very large datasets, they can then be adapted for other applications. To date, the best SLU results have been obtained with pipeline systems incorporating SSL models.

However, none of the architectures, pipeline or end-to-end, is perfect. In this thesis, we study these architectures and propose hybrid versions that attempt to benefit from the advantages of each. After developing a state-of-the-art end-to-end SLU model, we evaluated different hybrid strategies. The advances made by SSL models during the course of this thesis led us to integrate them into our hybrid architecture.

Keywords:

spoken language understanding, automatic speech recognition, neural networks, pre-trained models, self-supervised models, semantic concepts extraction