We focus on a design-of-experiments methodology for developing empirical performance models of GPU kernels. Recently, we developed an iterative active learning algorithm that adaptively selects parameter configurations in batches for concurrent evaluation on CPU architectures in order to build performance models over the parameter space. In this paper, we illustrate the adoption of the algorithm when concurrent evaluations are not possible, which is particularly useful in the absence of GPU clusters. We present an empirical study of the algorithm on a diverse set of GPU kernels and hardware. We show that even when concurrent evaluations are not possible, the default batch mode of the algorithm yields better models and the iterative active learning algorithm reduces the overall time required to obtain high-quality empirical performance models for GPU kernels.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org