Séminaire de Gaëtan Caillaut, Post-Doctorant au LIUM
MiniBERT: a simple and explainable BERT model
As part of the PolysEmY project, we work with the SNCF (French railway company) to produce “polysemic-aware” word embeddings. Documents provided by the SNCF are written in technical vocabulary, specific to the SNCF. It is hence difficult to re-use models trained on generalist corpora (such as Wikipedia) since they cannot take into account the specificities of the SNCF’s documents. Especially, a lot of acronyms are used, and many of them (more than 40%) are polysemous.
We have two main goals, which are (1) capturing polysemous information from text while (2)keeping the model simple and explainable. We assume that a word meaning can be deduced from its context, this is why we think the attention mechanism is perfectly suitable to encode polysemous words, since it allows to weight each pair of words according to their relative relevance according to a given criterion (here, the criterion is the semantic influence of one word on another).
We also try to keep our model as simple as possible, since simple models are naturally easier to explain and understand than models compounded of billions parameters (such as BERT or GPT-3). Furthermore, since we work on a relatively small corpus, and because we focus on a single task (capturing polysemy), we think that a model as powerful as BERT is not required.
During my presentation, I will introduce the PolysEmY project on which I’m working. Then I’ll introduce the MiniBERT model and our motivations to work on this extreme simplification of the BERT model. You will also see that, while being quite simplistic, MiniBERT’s performances are actually competitive and its output are easily explainable.