PhD defence, Manon Macary

Date : 24/06/2022
Time : 14h00
Location : IC2, Boardroom and online

Title : Massive and real-time data analysis in order to extract semantic and emotional information from speech

Jury members :

  • Ms. Martine ADDA-DECKER, Directrice de Recherche, LPP – CNRS-Sorbonne Nouvelle, Reviewer
  • Mr. Denis JOUVET, Directeur de recherche, INRIA-LORIA – Université de Lorraine, Reviewer
  • Mr. Fabien RINGEVAL, Maître de Conférences, LIG – Université Grenoble Alpes, CNRS, Examiner
  • Mr. Damien LOLIVE, Directeur, ENSSAT – Université de Rennes, Examiner
  • Mr. Yannick ESTÈVE, Professeur, LIA, Avignon Université, PhD Director
  • Ms. Marie TAHON , Maître de Conférence, LIUM – Le Mans Université, Co-supervisor
  • Mr. Merouane ATIG, Directeur technique , Allo-Média, Invited



Call centers receive thousands of calls every day in order to connect clients and agents. Thus lots of information can be extracted from these conversations, including the emotional aspect of the speakers.

This CIFRE thesis was carried out in collaboration with the Allo-Media company, that is specialized in the automatic analysis of call center conversations. Concretely, they set up information records on different aspects of the conversation by discretizing the information to allow automatic processing of the data. The company seeks to enrich its annotations with an innovative solution to add an emotional aspect relevant with the context of customer relations in order to alert on the difficult points of the conversation.

This thesis therefore attempts to respond to several issues: (i) first of all the definition of the emotion of satisfaction and frustration in speech, (ii) the establishment of an automatic recognition of these emotions on a continuous basis throughout the conversation and (iii) methods to evaluate these automatic systems.

The contributions of this thesis are: (i) the construction of a corpus from real data, continuously annotated in satisfaction and frustration, (ii) the implementation of different strategies to build an automatic recognition system using deep neural networks by comparing ourselves to the state of the art, (iii) the exploration of the dissociation of the acoustic and linguistic aspects of conversations in order to improve our recognition systems and finally (iv) the implementation of a nuanced assessment of these systems.



Speech Emotion Recognition, New Corpora, Satisfaction and Frustration, Pre-train embeddings