Evaluating clustering quality using features salience: a promising approach
Localisation: Salle de conseil, IC2, Le Mans Université
Speak(s): Nicolas Dugué (LIUM – LST)
The major concern of this talk is optimal model selection in hard clustering. New quality indexes based on feature maximization will be presented. Feature maximization is an efficient alternative approach for feature selection in high dimensional spaces to usual measures like Chi-square, vector-based measures using Euclidean distance or correlation. The behavior of these feature maximization based indexes is compared with a wide range of usual indexes, and with alternative indexes as well, on different kinds of datasets for which ground truth is available. This comparison highlights the better accuracy and stability of the new indexes on these datasets, their efficiency from low to high dimensional range and their tolerance to noise. Additional experiments are done on ”real life” textual data issued from a bibliographic database for which ground truth is unavailable. Experiments highlight that the accuracy and stability of these new indexes allow to efficiently manage time-based diachronic analysis.