Lattice Boltzmann method (LBM) is a promising approach to solving Computational Fluid Dynamics (CFD) problems, however, itis often considered to be memory-bound on modern computer architectures. This paper introduces novel sequential and parallel 3D memory-aware LBM algorithms to optimize its memory access performance. The designed new algorithms combine features of single-copy distribution, single sweep, swap algorithm, prism traversal, and merging two temporal time steps. We also design a methodology to guarantee thread safety and reduce synchronizations in the parallel LBM algorithm. At last, we evaluate their performances on three manycore systems and show that the new 3D memory-aware LBM algorithms outperform the state-of-the-art Palabos (which implemented the Fuse Swap Prism LBM solver) by up to 89%.
Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems