Current desktop computers are heterogeneous systems that integrate different types of processors. For example, general-purpose processors and GPUs do not only have different characteristics but also adopt diverse programming models. Despite these differences, data parallelism is exploited for both types of processors, by using application processing interfaces such as OpenMP and CUDA, respectively. In this work we propose to collaboratively use all these types of processors, thus increasing the amount of data parallelism exploited. In this setup, each processor executes its own optimized implementation of a target application. To achieve this goal, a platform has been developed composed of a task scheduler and an algorithm for runtime dynamic load balancing using online performance models of the different devices. These models are built without relying on any prior assumptions on the target application or system characteristics. The modeling time is negligible when several instances of a class of applications are executed in sequence or for iterative applications. As a case study, a database application is chosen to illustrate the usage of the proposed algorithm for building the performance models and to achieve dynamic load balancing. Experimental results clearly show the advantage of collaboratively using a quad-core processor along with a GPU. In practice, a performance improvement of about 42% is achieved by applying the proposed techniques and tools to Query Q3 of the TPC-H Decision Support System benchmark.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com