State-of-the-art dense linear algebra software, such as the LAPACK and ScaLAPACK libraries, suffer performance losses on multicore processors due to their inability to fully exploit thread-level parallelism. At the same time the coarse-grain dataflow model gains popularity as a paradigm for programming multicore architectures. This work looks at implementing classic dense linear algebra workloads, Cholesky factorization and QR factorization, using dynamic data-driven execution. Two emerging approaches to implementing coarse-grain dataflow are examined, the model of nested parallelism, represented by the Cilk framework, and the model of parallelism expressed through an arbitrary Direct Acyclic Graph, represented by the SMP Superscalar framework. Performance and coding effort are analyzed and compared agains code manually parallelized at the thread level.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org