

Plane Wave based first principleselectronic structurecalculationsare the most widely used approach for electronic structure calculations in materials science. In this formulation the electronicwavefunctionsare expandedin plane waves (Fourier components)in threedimensionalspaceand 3d FFTs are used to construct the chargedensity in real space.Manyotherscientific application codesin the areas of fluid mechanics, climate research and accelerator design also require efficient parallel 3d FFTs. Due to the large amount of communications required in parallel 3d FFTs the scaling of these application codes on large parallel machines depends critically on having a 3d FFT that scalesefficiently to large processorcounts.In this paper we compare different implementations for the communications in a 3d FFT to determinethemost scalablemethodto usefor ourapplication.We presentresults up to 16K cores on the Cray XT4 and IBM Blue Gene/P as well as compare our implementations to publicly available 3d FFTs such as P3DFFT and FFTW. In our application our 3d FFTs significantly outperform any publicly available software. Our 3d FFT has been implemented in many different first principles codes used for research in materials science, nanoscience, energy technologies etc. as well as being a stand alone benchmark code used for the procurement of new machines at the Department of Energy NERSC computing center.