Mesh deformation is a performance critical part of many problems in fluid dynamics. Radial basis function (RBF) interpolation based methods for mesh deformation have addressed the increasing complexity for larger data sets. Recently a domain decomposition method has been introduced which allows mapping these algorithms well to distributed memory systems. Because heterogeneous systems have proven to be more time and energy efficient for some applications, the HPC resources available to engineering users and researchers become increasingly heterogeneous.
In this paper, we describe two optimisations performed on a RBF based interpolation solver for mesh deformation. Motivated by a theoretical performance analysis, the existing MPI distributed model was extended to hybrid parallelisation with OpenMP to achieve better scaling efficiency on systems with hundreds of cores. In addition, an auto-tuning step at compile time which selects a threshold for code paths was introduced which yields up to twofold performance compared to a constant parameter approach across all test systems.
Our results indicate scaling efficiency in excess of 50–70 % for fully utilised dual socket systems of up to 96 cores, which is above the theoretical ideal performance of the baseline code. Utilisation of a single GPU improves time and energy to solution when a single CPU core is used, constraints of the applied domain decomposition which degrade performance when a single GPU is combined with many CPU cores are identified.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org