PhD thesis: Interpretable and frugal models for acoustic monitoring of ecosystems
Supervisors: Théo Mariotte (LIUM), Marie Tahon (LIUM, dir)
Host Laboratory: Laboratoire d’Informatique de l’Université du Mans (LIUM)
Site: Le Mans
Starting date: October 2026
Contact: Marie Tahon et Theo Mariotte (name.surname@univ-lemans.fr)
How to apply: You can submit your application (CV, motivation letter, ) on the dedicated platform by May 10, 2026 : https://amethis.doctorat.org/amethis-client/prd/consulter/offre/3325
Candidate profile
The candidate must have expertise in machine learning (particularly deep learning) and signal processing.
Abstract
The study of ecosystems relies mainly on surveying the species present in a given area and monitoring the evolution of their populations. Ecoacoustics [Stowell and Sueur, 2020] offers an innovative approach by using microphones placed in natural environments to periodically record the soundscape. This data acquisition, which can span from a single day to several years, provides a rich foundation for analyzing biodiversity and ecosystem spatio-temporal dynamics.
Traditionally, experts analyze these recordings by identifying sounds of interest based on temporal patterns and frequency content. More recently, the rise of neural models for species classification has opened new perspectives for interpreting soundscapes [Michaud, 2025]. However, these approaches have two major limitations: conventional methods lack the precision required for fine-grained soundscape analysis, while neural models, despite their eciency, remain species-centered and inherently dicult for humans to interpret. What’s more, the learning of complex models is hampered by the scarcity of annotated data, limiting the possibility of model exploitation. In this context, the objective of this thesis will be to develop interpretable and resource-efficient approaches based on automatic audio signal processing for the improvement and monitoring of ecosystems through acoustics.
Objectives
Initially, the focus will be on analyzing and simulating ecoacoustic data to train segmentation and individual clustering models. These fine-temporal-resolution approaches may draw inspiration from recent work in speaker diarization, such as EEND-VC [Kinoshita et al., 2021]. This will enable the longitudinal prediction of biodiversity indicators such as species richness (the number of different species) and abundance (the number of individuals) [Bradfer-Lawrence et al., 2023].
Second,
we plan to condition the models on environmental measures that significantly impact species song (temperature, humidity, season, time of day). Recent techniques, developed for instance for disentanglement, may be explored [Wang et al., 2024; Almudevar et al., 2024]. Finally, to improve our understanding of the models, they will need to provide audible explanations in the form of prototypes (sonotypes) [Paissan et al., 2024; Mariotte et al., 2024].




Français
