Nowadays, multi-core processors and GPUs with thousands of cores are omnipresent. Fully exploiting their resources involves dealing with low-level concepts of parallel programming. These low-level concepts still constitute a high barrier to efficient development of parallel applications. That is why we need high-level tools for parallel programming.
In order to assist programmers in developing performant and reliable parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel programming patterns, thereby shielding programmers from low-level aspects of parallel programming.
In this paper we take on the design and implementation of the well-known Farm skeleton. In order to address heterogeneous computing platforms we present a multi-tier implementation on top of MPI, OpenMP, and CUDA. On the basis of two benchmark applications, including an interacting particles system and a ray tracing application, we illustrate the advantages of both skeletal programming in general and this multi-tier approach in particular.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com