Application programming for modern heterogeneous systems which comprise multiple accelerators (multi-core CPUs and GPUs) is complex and error-prone. Popular approaches, like OpenCL and CUDA, are low-level and offer no support for the two most complicated issues: 1) programming multiple GPUs within a stand-alone computer, and 2) managing distributed systems that integrate several such computers. In particular, distributed systems require application developers to use a mix of different programming models, e.g., MPI together with OpenCL or CUDA. We propose a uniform approach based on OpenCL for programming both stand-alone and distributed systems with GPUs. The approach implementation is based on two parts: 1) the SkelCL library for high-level application programming on heterogeneous stand-alone computers with multi-core CPUs and multiple GPUs, and 2) the dOpenCL middleware for transparent execution of OpenCL programs on several stand-alone computers connected over a network.
Both SkelCL and dOpenCL are built on top of the OpenCL standard which ensures their high portability across different kinds of processors and GPUs. The dOpenCL middleware extends OpenCL, such that arbitrary computing devices (multi-core CPUs and GPUs) in a distributed system can be used within a single application, with data and program code moved to these devices transparently. The SkelCL library offers a set of pre-implemented patterns (skeletons) of parallel computation and communication which greatly simplify programming for multi-GPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between a system's main memory and multiple GPUs. This paper describes dOpenCL and SkelCL and illustrates how they are used to simplify programming of heterogeneous distributed systems with accelerators.