Corpus : Tunisian Sentiment Analysis Corpus. (TSAC)

Licences : GNU Lesser General Public License v3.0
GitHub : https://github.com/fbougares/TSAC


About 17k user comments manually annotated to positive and negative polarities. This corpus is collected from Facebook users comments written on official pages of Tunisian radios and TV channels namely Mosaique FM, JawhraFM, Shemes FM, HiwarElttounsi TV and Nessma TV. The corpus is collected from a period spanning January 2015 until June 2016.

For the use of TSAC corpus, please consider the following paper :

Salima Medhaffar, Fethi Bougares, Yannick Estève and Lamia Hadrich-Belguith. Sentiment analysis of Tunisian dialects: Linguistic Ressources and Experiments. WANLP 2017. EACL 2017