Tags:Heterogeneous Computing, Kernel Fusion and OpenCL
Abstract:
Kernel Fusion is a widely applicable optimization for numerical libraries on heterogeneous systems. However, most automated systems capable of performing the optimization require changes to software development practices, through language extensions or constraints on software organization and compilation. This makes such techniques inapplicable for preexisting software in a language like OpenCL. This work introduces an implementation of kernel fusion that can be deployed fully within the dened role of the OpenCL library implementation. This means that programmers with no explicit intervention, or even precompiled OpenCL applications, could utilize the optimization. Despite the lack of explicit programmer eort, our compiler was able to deliver an average of 12.3% speedup over a range of applicable benchmarks on a target CPU platform.