Fast Heterogeneous Computing: Principles and CUDA Programming

Kowalik, Janusz; Arłukowicz, Piotr

doi:10.3233/978-1-60750-803-8-65

Abstract

The current dynamic development of heterogeneous (CPU+GPU) computing and its applications to scientific, engineering and business problems owes the success to several factors. One of them is the maturity of parallel computing after many years of struggle and experimentation with different parallel computer architectures. The second is the relatively low price of processors and our ability to put many of them on a single chip. The third equally important factor is the structure of very many numerical mathematics algorithms containing highly parallelizable operations whose processing can be accelerated by using massively parallel GPU and multicore CPU. In this paper we provide an overview of the field and simple but realistic examples. The paper is targeted for beginner CUDA users. We have decided to show a simple source code of vector addition on GPU. This example does not cover advanced CUDA usage, such as shared memory accesses, divergent branches, optimization coalescing or loop unrolling. To illustrate performance we demonstrate results of matrix-matrix multiplication where some of the optimization techniques were used to gain impressive speedup.

This website uses cookies

This website uses cookies