GPU Accelerated Mahalanobis-Average Hierarchical Clustering Analysis

Title:GPU Accelerated Mahalanobis-Average Hierarchical Clustering Analysis

Authors:Adam Šmelko, Miroslav Kratochvíl, Martin Kruliš and Tomáš Sieger

Conference:Euro-Par2021

Tags:clustering, CUDA, GPU, high-dimensional, Mahalanobis distance and parallel

Abstract:

Hierarchical clustering algorithms are common tools for simplifying, exploring and analyzing datasets in many areas of research. In case of flow cytometry, a specific variant of agglomerative clustering has been proposed, that uses cluster linkage based on Mahalanobis distance to produce results better suited for the domain. However, wide applicability of this clustering algorithm is currently limited by its relatively high computational complexity, which does not allow it to scale to common cytometry datasets. This paper proposes an optimized GPU-accelerated version of the Mahalanobis-average linked hierarchical clustering, which improves the algorithm performance by over two orders of magnitude, thus allowing it to scale to much larger datasets. It also provides detailed analysis of used optimizations and experimental results which may be useful for other hierarchical-clustering problems. We have performed benchmarks on publicly available high-dimensional data from flow cytometry which demonstrates applicability of our implementation in the target domain.