Artificial Intelligence for extracting semantic information from speech


Master 1 internship
Supervisors: Nathalie Camelin, Antoine Laurent
Hosting labs: LIUM (Laboratoire d’Informatique de l’Université du Mans)
Place: Le Mans Université
Beginning of internship : April 2024

Application: Send a CV, a covering letter relevant to the proposed subject before December 15, 2023


Context of internship : This internship is in line with the research themes of the Language and Speech Technologies (LST) team of the Computer Science Laboratory of the University of Le Mans (LIUM). It will take place in Le Mans, as part of the ANR AISSPER project. The aim of the ANR AISSPER (Artificial Intelligence for Semantically controlled SPEech UndeRstanding) project is to propose new algorithms to solve the difficult problem of speech understanding. Despite the development and marketing of numerous intelligent personal assistants (Alexa, Google Home, etc.), speech understanding remains an area where many scientific barriers remain to be overcome.

AISSPER brings together leading researchers in the field of artificial intelligence and automatic language processing. The project is coordinated by the LIA (Laboratoire Informatique de l’Université d’Avignon). It began in January 2020 with LIUM and ORKIS as partners. The work has been divided into several sub-parts (Work Packages). WP2 focuses on speech understanding at the turn level, while WP3 will extend the work of WP2 to include understanding at the document level.

Description: AISSPER aims to improve the recognition of semantic concepts using methods derived from artificial intelligence. To achieve this, the AISSPER partners will focus their work on investigating new deep learning methods. The idea is to build on the use of semantics in specific attention mechanisms [Vaswani 2017] adapted to different sets of information contexts. AISSPER aims to develop new paradigms that jointly model acoustic and semantic information for the semantic analysis of spoken documents using so-called End2End neural approaches, i.e. from the speech signal to semantic information.

On the LIUM side, most of the work in WP2 was carried out by a PhD student. Understanding at the turn-of-speech level was explored in a defined application framework (MEDIA corpus [Bonneau 2006]) with the proposal of original architectures for the direct extraction of complete semantic representations from the signal [Pelloin 2021] or the joint use of acoustic, linguistic and semantic information. The work of the internship is part of WP3: global understanding of a dialogue based on plural extraction of semantic information. In particular, it will involve setting up a system for detecting named entities [Caubrière 2018], based on speech or text. We will also be implementing an initial topic modelling system to automatically extract all the topics raised by the dialogue under consideration. Classical methods such as LDA or recent neural topic models will be studied. [Zhao 2021].

The trainee will work in collaboration with LIUM researchers and may also be required to work in collaboration with LIA researchers. The experiments will be conducted on the DECODA corpus [De Mori 2012], containing RATP dialogues annotated into 8 general themes.




  • [Caubrière 2018] A. Caubriere, Y. Esteve, N. Camelin, E. Simonnet, A. Laurent, and E. Morin. “End-To-End Named Entity And Semantic Concept Extraction From Speech.” 2018 IEEE Spoken Language Technology Workshop (SLT) 2018.

  • [Vaswani 2017] Ashish Vaswani Noam Shazeer Niki Parmar Aidan N. Gomez y Lukasz Kaiser. “Attention Is All You Need.” In NIPS 2017

  • [Bonneau 2006] Bonneau-Maynard, Héléne and Ayache, Christelle and Bechet, Frédéric and Denis, Alexandre and Kuhn, Anne and Lefèvre, Fabrice and Mostefa, Djamel and Quignard, Matthieu and Rosset, Sophie and Servan, Christophe and others. “Results of the French Evalda-Media Evaluation Campaign for Literal Understanding.” In LREC 2006

  • [De Mori 2012] De Mori Renato and Arbillot Eric, Bechet Frederic And Maza Benjamin And Bigouroux Nicolas And Bazillon Thierry And El-Beze Marc. “DECODA: A Call-Centre Human-Human Spoken Conversation Corpus.” In LREC 2012

  • [Zhao 2021] Zhao, H. et al. Topic modelling meets deep neural networks: A survey. arXiv preprint, 2021

  • [Pelloin 2021] Pelloin, V., Camelin, N., Laurent, A., et al. End2end acoustic to semantic transduction. In : ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. p. 7448-7452.