Exchanges for SPEech ReseArch aNd TechnOlogies (ESPERANTO)Date: 01/2021 – 12/2025Funding: EU H2020Call: H2020-MSCA-RISE-2020Partners: academic (MS): Université du Mans, Universidad de Zaragoza, The University of Sheffield, Brno University of Technology, Laboratoire national de métrologie et d’essais (LNE), Université Grenoble Alpes, Avignon Université, University of Yaounde, Consejo Nacional de Investigaciones cientificas y tecnicas, Universidad de Chile, Centro […]

automatic speech processing in meetings using microphone arrayStarting: 01/10/2020PhD Student: Théo MariotteAdvisor(s): Jean-Hugh Thomas (LAUM), Anthony Larcher (LIUM)Co-advisor(s): Silvio Montresor (LAUM)Funding: RFI Le Mans acoustiqueThe subject is supported by two laboratories of Le Mans – Université: the acoustics lab (LAUM) and the computer science lab (LIUM). The aim is to enhance automatic speech processing in […]

Temporal word embeddings: neologisms, gender bias, corpus of French newsStarting: 01/10/2020PhD Student: Thibault ProuteauAdvisor(s): Sylvain MeignierCo-advisor(s): Nicolas Dugué Funding: Allocation de recherche du ministère de l’enseignement supérieurContexte de la thèse : La télévision, la production littéraire et internet fournissent des traces de notre utilisation de la langue [6]. Grâce à l’Ina, la mémoire de la […]

Artificial Intelligence for a semantically controlled speech understandingStarting: 01/10/2020PhD Student: Valentin PelloinAdvisor(s): Sylvain MeignierCo-advisor(s): Nathalie Camelin et Antoine LaurentFunding: ANR AISSPERDescritpion Le projet ANR AISSPER (Artificial Intelligence for Semantically controlled SPEech UndeRstanding) a pour objectif de proposer des nouveaux algorithmes afin de résoudre le difficile problème de la compréhension de la parole. En effet, malgré […]

Extraction of end-to-end semantic information from audio signalStarting: 01/10/2020PhD Student: Martin LebourdaisAdvisor(s): Sylvain MeignierCo-advisor(s): Antoine Laurent, Marie TahonFunding: ANR GEMThe GEM project aims to describe the differences in representation and treatment between women and men in the media, based on the automatic analysis of large volumes of French-language data contained in the INA and Deezer […]

PhD defence, Salima Mdhaffar Date: 01/07/2020 Time: 9h30 Location: Université d’Avignon, videoconference Title : Speech Recognition in the context of lectures: Evaluation, Progress and Enrichment Jury members: Reviewers: – Prof. Georges Linarès (Professeur, Université d’Avignon) – Dr. Irina Illina (Maître de conférences HDR, Université de Nancy) Examiners: – Prof. Sylvain Meignier (Professeur, Le Mans Université) […]


Corpus: ArSentimentAnalysis (ArSentimentAnalysis)GitHub: Author(s): Amira BarhoumiNathalie CamelinYannick EstèveLe package ArSentimentAnalysis comprend un ensemble de ressources permettant de concevoir et évaluer un système d’analyse d’opinions en arabe. Le package contient: Des ensembles d’embeddings spécifiques à l’arabe pré-entrainés Le lexique polarisé ArSentLex 1/ Ensembles d’embeddings spécifiques à l’arabe : Les embeddings pré-entrainés existants représentent un mot […]


Corpus: AlloSat (AlloSat)Licence: creative CommonsAuthor(s): Manon MacaryMarie TahonAnthony RousseauYannick EstèveThe corpus, named AlloSat, is composed of real-life call center conversations in French and is continuously annotated in frustration and satisfaction. This corpus has been set up to develop new systems able to model the continuous aspect of semantic and paralinguistic information at the conversation level. […]


Corpus: Multi30k Dataset (Multi30k)Licence: Attribution-NonCommercial-ShareAlike 4.0 InternationalGitHub: Loïc BarraultOzan CaglayanFethi BougaresThe Flickr30K Dataset contains 31,014 images sourced from online photo-sharing websites (Young et al., 2014). Each image is paired with five English descriptions, which were collected from Amazon Mechanical Turk2. The dataset contains 145,000 training, 5,070 development, and 5,000 test descriptions. The Multi30K dataset […]


Corpus: Tunisian Sentiment Analysis Corpus. (TSAC)Licence: GNU Lesser General Public License v3.0GitHub: Fethi BougaresSalima MdhaffarYannick EstèveAbout 17k user comments manually annotated to positive and negative polarities. This corpus is collected from Facebook users comments written on official pages of Tunisian radios and TV channels namely Mosaique FM, JawhraFM, Shemes FM, HiwarElttounsi TV and Nessma […]