Modern computer hardware provides parallelism at various different levels – most obviously, multiple multicore processors allow many independent threads to execute at once. At a finer-grained level, each core contains a vector unit allowing multiple integer or floating point calculations to be performed with a single instruction. Additionally, GPU hardware is highly parallel and performs best when processing large numbers of independent threads. At the same time, tools such as CUDA have become steadily more abundant and mature, allowing more of this parallelism to be exploited.
In this paper we describe the process of optimising a physical modelling sound synthesis code, the Multiplate 3D code, which models the acoustic response of a number of metal plates embedded within a box of air. This code presented a number of challenges and no single optimisation technique was applicable to all of these. However, by exploiting parallelism at several different levels (multithreading, GPU acceleration, and vectorisation), as well as applying other optimisations, it was possible to speed up the simulation very significantly.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com