Séminaire de Simon Guillot et Thibault Prouteau – Laboratoire d'Informatique de l'Université du Mans

Seminare from Simon Guillot and Thibault Prouteau, PhD students at LIUM

Date: 12/05/2023
Time: 11h00
Localization: IC2, boardroom
Speakers: Simon Guillot and Thibault Prouteau

Sparser is better: one step closer to word embedding interpretability

Sparse word embeddings models (SPINE, SINr) are designed to embed words in interpretable dimensions. An interpretable dimension is such that a human can interpret the semantic (or syntactic) relations between words active for a dimension. These models are useful for critical downstream tasks in natural language processing (e.g. medical or legal NLP), and digital humanities applications.

The study presented in this seminar aims to extend interpretability at the vector level by integrating psycholinguistic constraints into the definition of representations. Subsequently, one of the key criteria to an interpretable model is sparsity: in order to be interpretable, not every word should be represented by all the features of the model, especially if humans have to interpret these features and their relations. This raises one question: to which extent is sparsity sustainable with regard to performance?

We thus introduce a sparsification procedure to evaluate its impact on two interpretable methods (SPINE and SINr) to tend towards sustainable vector interpretability. We also introduce stability as a new criterion to interpretability.