As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
The current dynamic development of heterogeneous (CPU+GPU) computing and its applications to scientific, engineering and business problems owes the success to several factors. One of them is the maturity of parallel computing after many years of struggle and experimentation with different parallel computer architectures. The second is the relatively low price of processors and our ability to put many of them on a single chip. The third equally important factor is the structure of very many numerical mathematics algorithms containing highly parallelizable operations whose processing can be accelerated by using massively parallel GPU and multicore CPU. In this paper we provide an overview of the field and simple but realistic examples. The paper is targeted for beginner CUDA users. We have decided to show a simple source code of vector addition on GPU. This example does not cover advanced CUDA usage, such as shared memory accesses, divergent branches, optimization coalescing or loop unrolling. To illustrate performance we demonstrate results of matrix-matrix multiplication where some of the optimization techniques were used to gain impressive speedup.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.