Tags:GPU, sparse format and Sparse matrix matrix multiplication
Abstract:
The sparse matrix-matrix multiply kernel (SpMM) gained significant interest in the last years due to its applications in data science. In 2018, Zhang and Gruenwald [15] proposed the bitmap-based sparse format bmSparse and described in detail the implementation of the SpMM for Nvidia GPUs. The novel format is promising in terms of performance and storage space. In this work, we re-implement the algorithm following the authors’ guidelines, adding two new stages that can benefit performance. The experiments performed using nine sparse matrices of different sizes show significant accelerations with respect to cuSparse’s CSR variant.
Towards an Efficient Sparse Storage Format for the SpMM Kernel in GPUs