Open PhD position: Automatic diagnosis of errors of end-to-end speech transcription systems from users perspective

Main laboratory: ​Laboratoire Informatique d’Avignon​ (LIA)
Supervisors: Richard Dufour (​LIA​, ​Avignon University) and Jane Wottawa (LIUM​, ​Le Mans University)
Start time:​ September 2021

Project context
This Ph.D. position is part of the French research project DIETS (Automatic diagnosis of errors of end-to-end speech transcription systems from users perspective) funded by the ANR (French National Research Agency) which aims at analyzing finely recognition errors by taking into account their human reception, and understanding and visualizing how these errors manifest themselves in an end-to-end ASR framework. The main objectives are to propose original automatic approaches and tools to visualize, detect and measure transcription errors from the end-users perspective.

Candidate profile
The applicant must hold a Master degree in Computer Science. ​Mastery of at least one common object programming language (Java, C++…) and one scripting language (Python, Perl…) are mandatory, furthermore experience in automatic language and speech processing, or machine learning, data mining are appreciated. He or she should also show interest in linguistics and the study of human behavior.

The main objective of the thesis is to finely analyze transcription errors from the point of view of their reception by the user. The thesis will have three complementary parts:

  1. Approaches for error detection in transcripts of end-to-end ASR systems. This should lead to original confidence measures.
  2. Detailed analysis of transcription errors in French, whether human or automatic, with a traditional or end-to-end system, in order to understand how errors are viewed from a human perspective. This will shed light on new classes of errors, guided by their difficulty, or ease, to be understood by end users.
  3. Realization of a new body of automatic transcriptions where errors are annotated using precise linguistic information, and information collected during perceptual tests to reflect how users perceive (and possibly correct) these errors. Carrying out different perceptual tests, by confronting humans with these transcription errors.

It will be a question of laying the first bases of a new and transversal research, at the crossroads between linguistics, computer science and cognitive sciences, for the evaluation of automatic systems and the understanding of NLP systems based on deep architectures. The Ph.D. student will then have the opportunity to learn and propose innovative approaches in automatic speech processing for the understanding of architectures with deep neural networks, but also to have an openness and skills in linguistics and on the implementation of perceptual tests.

Interests for the candidate:

  • Very favorable and collaborative work environment in an internationally recognized research laboratory in language processing and machine learning.
  • Implementation, analysis and proposals for innovative approaches to different ASR systems (classical and end-to-end frameworks).
  • Development of complementary metrics to WER that are user-oriented.
  • Transdisciplinary scientific work allowing openness to other disciplines (e.g. linguistics and cognitive sciences).

Applications​ should be sent to:

  • Richard Dufour (​​) – ​LIA​, ​Avignon University
  • Jane Wottawa (​​) – ​LIUM​, ​Le Mans University
  • and should include:

  • a detailed CV (education and research experiences),
  • a cover letter specifying the candidate’s research interests on this proposed Ph.D. thesis,
  • Bachelor (Licence) and Master grades in detail,
  • at least one reference that could be contacted for recommandation.


Further information can be found here :