Accelerate K-Modes Using the Triangle Inequality

Title:Accelerate K-Modes Using the Triangle Inequality

Authors:Vu-Linh Nguyen, Toan Nguyen-Mau and Van-Nam Huynh

Conference:SUM 2024

Tags:Clustering Categorical Data, Decomposable Dissimilarities, K-Modes and Triangle Inequality

Abstract:

Clustering is an unsupervised machine learning task that aims to discover natural groups in the given dataset. K-Modes, which are adaptions of K-means clustering for continuous data, are among the most popular algorithms for discovering clusters in categorical data. In this paper, we present some first results on how to accelerate them using the triangle inequality, while still always computing exactly the same result as the original K-Modes. We also provide some empirical evidence to illustrate the potential gains provided by leveraging the triangle inequality. Finally, we envision future work aimed at providing a comprehensive understanding of the use of triangle inequality in accelerating (other) clustering algorithms for categorical data.