Investigating Audio-Visual DeepFakes Detection
Level: Master 1
Supervisors: Aghilas Sini, Meysam Shamsi
Host Laboratory: Laboratoire d’Informatique de l’Université du Mans (LIUM), Team LST
Location: Le Mans
Beginning of internship: May 2025
Contact: Aghilas Sini, Meysam Shamsi (prénom.nom@univ-lemans.fr)
Application: Send a CV, a covering letter relevant to the proposed subject, your grades for the last two years of study and the possibility of attaching letters of recommendation to Aghilas Sini and Meysam Shamsibefore February 28, 2025
Introduction
Deepfakes represent a growing concern in the era of digital media, with the potential to undermine trust by creating convincingly altered audiovisual content. Developing effective methods to identify manipulated media is critical for combating misinformation and ensuring the integrity of digital communication. This internship focuses on deepfake detection techniques by leveraging audiovisual data and state-of-the-art methodologies.
Objectives
The ultimate goal is to train a model capable of predicting the originality of a given video/speech segment. The project is divided into two main phases:
Literature Review (4 weeks):
- Study existing datasets, with a focus on understanding the characteristics of “AV-Deepfake1M” [1] dataset, which contains labeled video/speech segments marked as real or fake.
- Analyze state-of-the-art model architectures for deepfake detection, including approaches that integrate audio and visual modalities [2,3,4].
Reproducing and Testing Methods (6 weeks): :
- Reproduce existing deepfake detection models using the « AV-Deepfake1M » [3,4]
- Evaluate these methods in terms of accuracy, efficiency, and generalization [5]
- Identify potential limitations and propose modifications or enhancements for improved performance.
Perspectives
The outcomes of this internship will contribute to the development of reliable and scalable deepfake detection systems, opening avenues for publications and further research initiatives in this critical domain.
This internship serves as the foundation for a long-term collaboration with the CENATAV lab (La Havane, Cuba), a research group with extensive expertise in video-based deepfake detection. The partnership aims to expand beyond the current project, fostering innovation in detecting manipulated media and addressing emerging challenges in multi-modal analysis.
Candidate rofil
Master 1 in Computer Science
References
[1]. Cai, Zhixi, et al. “AV-Deepfake1M: A large-scale LLM-driven audio-visual deepfake dataset.” Proceedings of the 32nd ACM International Conference on Multimedia. 2024.
[2]. Audio-Visual Deepfake Detection [https://github.com/qiqitao77/Awesome-Comprehensive-Deepfake-Detection?tab=readme-ov-file#multi-modal-deepfake-detection]
[3]. Zhang, Rui, et al. “Ummaformer: A universal multimodal-adaptive transformer framework for temporal forgery localization.” Proceedings of the 31st ACM International Conference on Multimedia. 2023. [https://github.com/ymhzyj/UMMAFormer]
[4]. Liu, Weifeng, et al. “Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes.” arXiv preprint arXiv:2401.15668 (2024). [https://github.com/AaronComo/LipFD]
[5]. Baseline code audio-visual-deepfake [https://github.com/vcbsl/audio-visual-deepfake/tree/main]