Accelerators such as NVIDIA GPUs and Intel MICs are currently provided as co-processor devices, usable only through a CPU host. For Intel MICs it is planned that this constraint will be lifted in the near future: CPU and accelerator(s) will then form a single, many-core, processor capable of peak performance of several Teraflops with high energy efficiency. In order to exploit the available computational power, the user will be compelled to write a code more “hardware-aware”, in contrast to the common philosophy of hiding hardware details as much as possible. The simple two-sided communication approach often used in message-passing applications introduces synchronization costs that may limit the performance on the next generation machines. PGAS languages, like coarray Fortran and UPC, propose a one-sided approach where a process accesses directly the remote memory of another process without interrupting its execution. In this paper, we propose a CUDA-aware coarray implementation, capable of merging the expressive syntax of coarrays with the computational power of GPUs. We propose a new keyword for the Fortran language, which allows the user to map with a high-level syntax some hardware features of the many-core machines. Our hybrid coarray implementation is based on OpenCoarrays, the coarray transport layer currently adopted by the GNU Fortran compiler.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com