Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this paper first the use of FPGAs as a multiprocessor on a chip or its use as a highly functional coprocessor are compared, and the programming tools for hardware/software codesign are discussed. Next a number of techniques are presented to maximize the parallelism and optimize the data locality in nested loops. This includes unimodular transformations, data locality improving loop transformations and use of smart buffers. Finally, the use of these techniques on a number of examples is demonstrated. The results in the paper and in the literature show that, with the proper programming tool set, FPGAs can speed up computation kernels significantly with respect to traditional processors.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com