Thematic segmentation of automatic transcriptions and enrichment of educational documents in a lecture context

Starting: 23/01/2017
PhD Student: Salima Mdhaffar
Advisor(s): Yannick Estève (LIUM, LST)
Co-advisor(s): Antoine Laurent (LIUM, LST), Nicolas Hernandez (LS2N), Solen Quiniou (LS2N)
Funding: ANR PASTEL Project

This thesis is a part of the PASTEL project (Performing Automated Speech Transcription for Enhancing Learning), which aims to explore the potential of real-time automatic transcription for the instrumentation of mixed teaching situations, where the modalities of interaction are presential or remote, synchronous or asynchronous.

More specifically, this thesis will cover the areas of automatic adaptation of language models, thematic segmentation and the enrichment of educational documents. In a context of lecture, the goal is to segment the output of the automatic speech recognition system.

This segmentation will be a thematic segmentation: it will be a question of detecting the borders of homogeneous zones at the level of the content in order to link each thematic zone with pedagogical documents available in an external knowledge base.