AutoClust: A Framework for Automated Clustering Based on Cluster Validity Indices



Automated machine learning (AutoML) aims to minimize human intervention during a machine learning task, for example by means of automatic algorithm selection and its configuration for the data set at hand. Although this research direction has attracted much interest lately, both in academia and industry, existing systems and tools mainly target the domain of supervised learning. However, unsupervised learning, in particular clustering, also calls for AutoML solutions, especially due to the ambiguity involved when evaluating clustering results. Motivated by this shortcoming, in this paper, we introduce a framework for automated clustering that encompasses two main modules: algorithm selection and hyperparameter tuning. Our approach to algorithm selection relies on meta-learning, based on novel meta-features extracted from data sets that attempt to capture similarities in the clustering structure. This approach is coupled with a method for hyperparameter tuning based on Bayesian optimization, where the main novelty is the proposal of an optimization goal that combines different cluster validity indices. We demonstrate the merits of our approach by empirical evaluation on 24 real-life data sets, which shows promising results when compared to existing methods.

Contributing Authors