Predictive Modeling of Subjective Disagreement in Speech Annotation/Evaluation
Supervisors: Meysam Shamsi and Anthony Larcher
Hosting labs: LIUM (Laboratoire d’Informatique de l’Université du Mans).
Place: Le Mans Université
Contacts: Meysam Shamsi and Anthony Larcher, (firstname.name@univ-lemans.fr)
Application: Send a CV, a covering letter relevant to the proposed subject, your grades for the last two years of study and the possibility of attaching letters of recommendation to all the supervisors, before January 10, 2024
Sujet : In the context of modeling subjective tasks, where diverse opinions, perceptions, and judgments exist among individuals, such as in speech quality or speech emotion recognition, addressing the challenge of defining ground truth and annotating a training set becomes crucial.
The current practice of aggregating all annotations into a single label for modeling a subjective task is neither fair nor efficient [1]. The variability in annotations or evaluations can stem from various factors [2], broadly categorized into those associated with corpus quality and those intrinsic to the samples themselves.
In the first case, the delicate definition of a subjective task introduces sensitivity into the annotation process, potentially leading to more errors, especially where the annotation tools and platform lack precision or annotators experience fatigue. In the second case, the inherent ambiguity in defining a subjective task and different perception may result in varying annotations and disagreements. Developing a predictive model to understand annotator/evaluator disagreement is crucial for engaging in discussions related to ambiguous samples and refining the definition of subjective concepts. Furthermore, this model can serve as a valuable tool for assessing the confidence of automatic evaluations [3,4].
This modeling approach will contribute to the automatic evaluation of corpus annotations, identification of ambiguous samples for reconsideration or re-annotation, automatic assessment of subjective models, and the detection of underrepresented samples and biases in the dataset.
The proposed research involves utilizing a speech dataset such as MS-Podcast [5], SOMOS [6], VoiceMOS [7], for a subjective task with multiple annotations per sample. The primary objective is to predict the variation in assigned labels, measured through disagreement scores, entropy, or distribution.
Applicant profile : Candidate motivated by artificial intelligence, enrolled in a Master’s degree in Computer Science or related fields.
References
[1] Davani, A. M., Díaz, M., & Prabhakaran, V. (2022). Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10, 92-110.
[2] Kreiman, J., Gerratt, B. R., & Ito, M. (2007). When and why listeners disagree in voice quality assessment tasks. The Journal of the Acoustical Society of America, 122(4), 2354-2364.
[3] Wu, W., Chen, W., Zhang, C., & Woodland, P. C. (2023). It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation. arXiv preprint arXiv:2310.00486
[4] Han, J., Zhang, Z., Schmitt, M., Pantic, M., & Schuller, B. (2017, October). From hard to soft: Towards more human-like emotion recognition by modelling the perception uncertainty. In Proceedings of the 25th ACM international conference on Multimedia (pp. 890-897).
[5] Lotfian, R., & Busso, C. (2017). Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 10(4), 471-483.
[6] Maniati, G., Vioni, A., Ellinas, N., Nikitaras, K., Klapsas, K., Sung, J.S., Jho, G., Chalamandaris, A., Tsiakoulis, P. (2022) SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis. Proc. Interspeech 2022, 2388-2392.
[7] Cooper, E., Huang, W. C., Tsao, Y., Wang, H. M., Toda, T., & Yamagishi, J. (2023). The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains. arXiv preprint arXiv:2310.02640.