Sahar Ghannay

PhD defence, Sahar GHANNAY Title : A study of continuous word representations applied to ASR error detection. Composition of the jury : Présidente : Martine Adda-Decker, Reviewers : Sophie Rosset, Frédéric Béchet, Examiners : Benoit Favre, Benjamin Lecouteux, Supervisor : Yannick Estève Co-supervisor : Nathalie Camelin Abstract : This thesis concerns a study of continuous […]

Antoine Caubrière

Deep neural networks for oral and written language processingStarting: 04/09/2017PhD Student: Antoine CaubrièreAdvisor(s): Yannick Estève (LIUM, LST)Co-advisor(s): Antoine Laurent (LIUM, LST) & Emmanuel Morin (LS2N)Funding: RAPACE ProjectThe aim of this thesis is to develop a named entity recognition system in an audio stream that will rely solely on a deep neural network. Until now, this […]

Amira Barhoumi

Towards a hybrid approach for Arabic Sentiment AnalysisStarting: 03/10/2016PhD Student: Amira BarhoumiAdvisor(s): Yannick Estève (LIUM, LST)Co-advisor(s): Nathalie Camelin (LIUM, LST) & Lamia Hadrich Belguith (MIRACL, Tunisie)Funding: Agreement “Cotutelle Convention” (LIUM, LST) & (MIRACL, Tunisie)Sentiment analysis is a growing field of research and has been subject of numerous studies. This thesis aims at designing a hybrid […]

TED-LIUM Release 3

Corpus: TED-LIUM Release 3Licence: Creative Commons BY-NC-ND 3.0 (attribution/non-commercial/no-derivatives)Author(s): François FernandezVincent NguyenSahar GhannayNatalia TomashenkoYannick EstèveThis is the TED-LIUM corpus release 3, licensed under Creative Commons BY-NC-ND 3.0 (   All talks and text are property of TED Conferences LLC.   This new TED-LIUM release was made through a collaboration between the Ubiqus company and the […]

TED-LIUM Release 1

Corpus: TED-LIUM Release 1Licence: Creative Commons BY-NC-ND 3.0 (attribution/non-commercial/no-derivatives)Author(s): Anthony RousseauPaul DelégliseYannick EstèveThis is the TED-LIUM corpus release 1, licensed under Creative Commons BY-NC-ND 3.0 (   The TED-LIUM corpus is English-language TED talks, with transcriptions, sampled at 16kHz. It contains about 118 hours of speech.   More details are given in this paper: A. […]


Software: NMTPYTORCHLicence: MIT LicenseGitHub: Ozan CaglayanMercedes García MartínezAdrien BardetWalid AransaFethi BougaresLoïc BarraultThis is the PyTorch fork of nmtpy, a sequence-to-sequence framework which was originally a fork of dl4mt-tutorial.


Software: SIDEKITLicence: LGPLGitHub: Anthony LarcherKong Aik LeeSylvain Meignier Welcome to SIDEKIT documentation! SIDEKIT is an open source package for Speaker and Language recognition. The aim of SIDEKIT is to provide an educational and efficient toolkit for speaker/language recognition including the whole chain of treatment that goes from the audio data to the analysis […]


Software: SIDEKIT for diarization (s4d)Licence: LGPLGitHub: Pierre-Alexandre BrouxFlorent DesnousAnthony LarcherSylvain Meignier Welcome to SIDEKIT for diarization documentation! SIDEKIT for diarization (s4d as short name) is an open source package extension of SIDEKIT for Speaker diarization . The aim of S4D is to provide an educational and efficient toolkit for speaker diarization including the […]

Salima Mdhaffar

Thematic segmentation of automatic transcriptions and enrichment of educational documents in a lecture contextStarting: 23/01/2017PhD Student: Salima MdhaffarAdvisor(s): Yannick Estève (LIUM, LST)Co-advisor(s): Antoine Laurent (LIUM, LST), Nicolas Hernandez (LS2N), Solen Quiniou (LS2N)Funding: ANR PASTEL ProjectThis thesis is a part of the PASTEL project (Performing Automated Speech Transcription for Enhancing Learning), which aims to explore the […]

Séminaire 09/11/2016

Une esquisse de mes travaux, sur l’apprentissage automatique et la segmentation thématique, au PASTEL   Date: 09/11/2016 Hour: 14h00 Localisation: Salle de conseil, IC2, Le Mans Université Speaker(s): Nicolas Dugué (LIUM – LST)   En proposant d’utiliser les résultats de la transcription automatique d’un cours pour la création de plateformes SPOC, le projet PASTEL fédère […]