The performance and the versatility of today's PCs exceeds many times the power of the fastest number crunchers in the 90s. Yet the computational hunger of many scientific applications has led to the development of GPU- and FPGA-accelerator cards. In this paper the programming environment and the performance analysis of a super desktop with a combined GPU/FPGA architecture is presented. A unified roofline model is used to compare the performance of the GPU and the FPGA taking into account the computational intensity of the algorithm and the resource consumption. The model is validated by two image processing kernels which are compiled using OpenCL for the GPU and a C-to-VHDL compiler for the FPGA. It is shown that an FPGA compiler outperforms handwritten code and is highly productive, but also uses more resources. While both the GPU and FPGA excel in particular applications, both devices suffer from the limited I/O bandwidth to the processor.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com