Tags:CUDA, GPGPU, loop optimization and parallelization methods
Abstract:
The technique that allows to extend GPU capabilities to deal with data volumes that outfit internal GPU's memory capacity developed. The technique involves loop tiling and data serialization and could be applied to utilize clusters consisting of several GPUs. Applicability criterion specified. Transforming scheme designed and semiautomatic proof-of-concept software tool implemented. Conducted an experiment to demonstrate the feasibility of the proposed approach.
Large-scale loops parallelization for GPU accelerators