Contemporary parallel programming approaches often rely on well-established parallel libraries and language extensions to address specific HW resources that can lead to mixed parallel programming paradigms. In contrast to these approaches, AllScale proposes a C++ template-based approach to ease the development of scalable and efficient general-purpose parallel applications. Applications utilize a pool of parallel primitives and data structures for building solutions to their domain-specific problems. HPC experts who provision high-level, generic operators and data structures for common use cases design these parallel primitives. The supported set of constructs may range from ordinary parallel loops, over stencil and distributed graph operations, and frequently utilized data structures including (adaptive) multidimensional grids, trees, and irregular meshes, to combinations of data structures and operations like entire linear algebra libraries. This set of parallel primitives is implemented using pure C++ and may be freely extended by third-party developers, similar to conventional libraries in C++ development projects. One of the peculiarities of AllScale is its main source of parallelism based on nested recursive task parallelism. Sophisticated compiler analysis determines the data needed for every task, which is of paramount importance to achieve performance across various parallel architectures. Experimental results for several applications implemented with AllScale will be shown.
On Using Modern C++ and Nested Recursive Task Parallelism for HPC Applications with AllScale