Diagnostic automatique des erreurs des systèmes de transcription de parole end-to-end à partir de leur réception par les utilisateurs (DIETS)
A major issue of language processing evaluation metrics concerns the fact that they are designed to globally mesure a proposed solution from a considered reference, with the main objective of being able to compare systems with each other. While automatic systems are aimed at end-users, they are ultimately little studied: the impact of these automatic errors on the human, and the way in which they are perceived at the cognitive level, has then never been studied, and ultimately not integrated into the evaluation process. The DIETS proposes to focus on the problematic of diagnosis/evaluation of end-to-end automatic speech recognition (ASR) systems by integrating human reception of transcription errors from a cognitive point-of-view. The challenge is here twofold: 1) to analyze finely ASR errors from a human reception, and 2) to understand and detect how these errors manifest themselves in an end-to-end ASR framework, whose work is inspired by how the human brain works.