Speech Translation System – Low-resourced Languages to High-resourced Languages

Niveau : Master 2

Supervisors: Aghilas Sini (LIUM), Mohammad Mohammadamini (LIUM)
Hosting lab : Laboratoire d’Informatique de l’Université du Mans (LIUM). The internship will take place on-site
Place : Le Mans Université
Beginning of internship: February to April 2025
Contact : Aghilas Sini et Mohammad Mohammadamini (prénom.nom@univ-lemans.fr)

Application: Send a CV, a covering letter relevant to the proposed subject, your grades for the last two years of study and the possibility of attaching letters of recommendation to all the supervisors, before November 15, 2024

 
Objective of the internship :
The aim of this Master 2 internship is to design a modern speech translation system. Traditional speech translation systems are based on cascade approaches, including a speech recognition system, a translation system and a speech synthesis system. However, these approaches have their drawbacks, notably an accumulation of errors between the different processing blocks.

One of the major problems concerns poorly endowed languages, for which the text resources required for translation (text-to-text) are limited, or even non-existent in sufficient quantity. Although solutions do exist, particularly in speech-to-text translation, these methods do not take into account certain aspects of the information, such as prosody.

Speech-to-speech translation (end-to-end translation) is a promising solution, as it directly integrates information from the source language without any intermediate steps. The aim of this placement is to explore this type of translation for under-resourced languages into high-resourced languages.
 
Source and target languages :

Langues sourcesLangues cibles
TamashaqFrançais
TasegwalitFrançais
KurdeAnglais
KabyleFrançais

 
Missions :

The candidate will be required to :
– Build up and analyse a corpus of spoken data according to the languages studied.
– Set up an end-to-end architecture (encoder-decoder) dedicated to speech translation.
– Interpreting evaluation metrics and optimising system performance.
– Studying representation spaces for speech-to-speech translation.
– Establish a benchmark by comparing different models and approaches.
 
Working Environnment :
The trainee will be working in the Language and Speech Technology (LST) team at the Laboratoire Informatique de l’Université du Mans (LIUM), which specialises in automatic speech and language processing.
The internship is part of the TV2M-E project, the aim of which is to develop a framework dedicated to multilingual and multimodal speech translation. The trainee will be supervised by :
– Aghilas Sini (aghilas.sini(at)univ-lemans.fr), Senior Lecturer, expert in expressive speech synthesis and speech identification.
– Mohammad Mohammadamini (mohammad.mohammadamini(at)univ-lemans.fr), Postdoctoral student, specialist in machine translation and automatic speech recognition.
 
Profile required :
– Skills in automatic speech processing.
– Solid knowledge of Python programming.
– Proficiency in deep learning libraries (in particular PyTorch).
– Ability to work independently and propose innovative solutions.
 
References :

[1] – Barrault, L., Chung, Y. A., Meglioli, M. C., Dale, D., Dong, N., Dup- penthaler, M.,… & Williamson, M. (2023). Seamless: Multilingual Expressive and Streaming Speech Translation. arXiv preprint arXiv:2312.05187.
[2] – Huang, Z., Ye, R., Ko, T., Dong, Q., Cheng, S., Wang, M., & Li, H. (2023). Speech translation with large language models: An industrial practice. arXiv preprint arXiv:2312.13585.
[3] – Lee, A., Chen, P. J., Wang, C., Gu, J., Popuri, S., Ma, X., … & Hsu, W. N. (2021). Direct speech-to-speech translation with discrete units. arXiv preprint arXiv:2107.05604.
[4] – Lee, A., Gong, H., Duquenne, P. A., Schwenk, H., Chen, P. J., Wang, C., … & Hsu, W. N. (2021). Textless speech-to-speech translation on real data. arXiv preprint arXiv:2112.08352.