Distributed software

The LIUM distributes several software and resources such as corpus. Some productions have been deposited with the Programme Protection Agency (APP) via the Technology Transfer Accelerating Society (SATT) Ouest Valorisation. The vast majority of the productions are distributed under free licenses, more or less restrictive (GPL, LGPL, Creative Common v3, CeCILL).

List of software


Corpus : PASTEL
Author(s): Salima Mdhaffar, Yannick Estève, Antoine Laurent, Nathalie Camelin
URL: https://lium.univ-lemans.fr/en/pastel-2/

The PASTEL corpus consists of a collection of courses from different computer science fields (automatic language processing, introduction to computer science, etc.) in the first year of the Bachelor's degree in computer science at the University of Nantes.

► Read more


Software: TurtleTablet
Author(s): Iza Marfisi, Sébastien George, Marc Leconte
URL: https://turtletablet.univ-lemans.fr/

TurtleTablet est un jeu collaboratif pour s’initier aux bases de la programmation. Pour favoriser une réelle collaboration entre les joueurs, le jeu peut être joué avec deux objets physiques (pièces tangibles) reconnus sur l’écran de la tablette. ► Read more


Corpus : ArSentimentAnalysis
GitHub: https://github.com/amirabaroumi/ArSentimentAnalysis
Author(s): Amira Barhoumi, Nathalie Camelin, Yannick Estève
URL: https://lium.univ-lemans.fr/en/arsentimentanalysis/

The ArSentimentAnalysis package includes a set of resources for designing and evaluating an Arabic sentiment analysis system.
The package contains:
- 1/ Sets of pre-trained Arabic-specific embeddings
- 2/ The ArSentLex polarized lexicon ► Read more


Corpus : AlloSat
Licence: creative Commons
Author(s): Manon Macary, Marie Tahon, Anthony Rousseau, Yannick Estève
URL: https://lium.univ-lemans.fr/en/allosat/

The corpus, named AlloSat, is composed of real-life call center conversations in French and is continuously annotated in frustration and satisfaction. This corpus has been set up to develop new systems able to model the continuous aspect of semantic and paralinguistic information at the conversation level. ► Read more


Corpus : Multi30k Dataset
Licence: Attribution-NonCommercial-ShareAlike 4.0 International
GitHub: https://github.com/multi30k
Author(s): Loïc Barrault, Ozan Caglayan, Fethi Bougares
URL: https://lium.univ-lemans.fr/en/multi30k/

The Flickr30K Dataset contains 31,014 images sourced from online photo-sharing websites (Young et al., 2014). The Multi30K dataset extends the Flickr30K dataset with translated and independent German sentences. ► Read more


Corpus : Tunisian Sentiment Analysis Corpus.
Licence: GNU Lesser General Public License v3.0
GitHub: https://github.com/fbougares/TSAC
Author(s): Fethi Bougares, Salima Mdhaffar, Yannick Estève
URL: https://lium.univ-lemans.fr/en/tsac/

About 17k user comments manually annotated to positive and negative polarities. This corpus is collected from Facebook users comments written on official pages of Tunisian radios and TV channels ► Read more


Software: TGRIS-tool
Author(s): Iza Marfisi
URL: https://lium.univ-lemans.fr/en/tgris-tool/

TGRIS is a Virtual Reality tool to simulate emotionally intense interviews. ► Read more


Corpus : Topic Segmentation
URL: https://hal.archives-ouvertes.fr/hal-01741177

FrNewsLink package allows to adress several applicative tasks in the domain of topic and titling segmentation. It is compososed of a set of resources from a varied corpus of French Broadcast News (BN) and press articles. Due to broadcasting rights, this package does not contain videos or audios files. ► Read more

TED-LIUM Release 3

Corpus : TED-LIUM Release 3
Licence: Creative Commons BY-NC-ND 3.0 (attribution/non-commercial/no-derivatives)
Author(s): François Fernandez, Vincent Nguyen, Sahar Ghannay, Natalia Tomashenko, Yannick Estève
URL: https://lium.univ-lemans.fr/en/ted-lium3/

This is the TED-LIUM corpus release 3, licensed under Creative Commons BY-NC-ND 3.0 (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). ► Read more

TED-LIUM Release 2

Corpus : TED-LIUM Release 2
Licence: Creative Commons BY-NC-ND 3.0 (attribution/non-commercial/no-derivatives)
Author(s): Anthony Rousseau, Paul Deléglise, Yannick Estève
URL: https://lium.univ-lemans.fr/en/ted-lium2/

This is the TED-LIUM corpus release 2, licensed under Creative Commons BY-NC-ND 3.0 (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). ► Read more

TED-LIUM Release 1

Corpus : TED-LIUM Release 1
Licence: Creative Commons BY-NC-ND 3.0 (attribution/non-commercial/no-derivatives)
Author(s): Anthony Rousseau, Paul Deléglise, Yannick Estève
URL: https://lium.univ-lemans.fr/en/ted-lium1/

This is the TED-LIUM corpus release 1, licensed under Creative Commons BY-NC-ND 3.0 (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). ► Read more


Software: NMTPY
Licence: MIT License
GitHub: https://github.com/lium-lst/nmtpy
Author(s): Ozan Caglayan, Mercedes García Martínez, Adrien Bardet, Walid Aransa, Loïc Barrault, Fethi Bougares
URL: https://arxiv.org/abs/1706.00457

nmtpy is a suite of Python tools for training mono- and multimodal neural machine translation systems using Theano. ► Read more


Licence: MIT License
GitHub: https://github.com/lium-lst/nmtpytorch/
Author(s): Ozan Caglayan, Mercedes García Martínez, Adrien Bardet, Walid Aransa, Fethi Bougares, Loïc Barrault
URL: https://arxiv.org/abs/1706.00457

This is the PyTorch fork of nmtpy, a sequence-to-sequence framework which was originally a fork of dl4mt-tutorial. ► Read more

LIUM Speaker Diarization

Software: LIUM Speaker Diarization
Licence: GPL
URL: https://projets-lium.univ-lemans.fr/spkdiarization/

Outil de segmentation et regroupement locuteur (Speaker diarization) en java. ► Read more


Software: SIDEKIT
Licence: LGPL
GitHub: https://git-lium.univ-lemans.fr/Larcher/sidekit
Author(s): Anthony Larcher, Kong Aik Lee, Sylvain Meignier
URL: https://projets-lium.univ-lemans.fr/sidekit/

SIDEKIT is an open source package for Speaker and Language recognition. ► Read more


Software: Hop3x
URL: http://hop3x.univ-lemans.fr

Hop3x is a learning environment for learning programming. It allows the teacher to remotely follow the programming activity of the learners by providing qualitative information (indicators) on this activity and a real-time visualization of the productions (source code of the programs). ► Read more


Software: Continuous Space Language Model toolkit
GitHub: https://git-lium.univ-lemans.fr/barrault/cslm
Author(s): Holger Schwenk
URL: https://git-lium.univ-lemans.fr/barrault/cslm/-/archive/master/cslm-master.tar.gz

CSLM toolkit is open-source software which implements the so-called continuous space language model. ► Read more


Corpus : MANY
Licence: GNU GPL v3
URL: https://code.google.com/archive/p/many/

Many is a MT System Combination software which architecture is described in ► Read more


Software: LEGADEE
Licence: Creative Commons
Author(s): Iza Marfisi
URL: https://lium.univ-lemans.fr/legadee/

LEGADEE (LEarning GAme DEsign Environment) is a free authoring environment that helps game designers and teachers design Learning Games that are fun and educational. ► Read more