Computational & Technology Resources
an online resource for computational,
engineering & technology publications |
|
Civil-Comp Proceedings
ISSN 1759-3433 CCP: 101
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by:
Paper 14
Parallel Performance of Fast Fourier Transform Routines in PRACE A. Sunderland1, C. Moulinec1 and R. Sandberg2
1STFC Daresbury Laboratory, Warrington, United Kingdom
A. Sunderland, C. Moulinec, R. Sandberg, "Parallel Performance of Fast Fourier Transform Routines in PRACE", in , (Editors), "Proceedings of the Third International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 14, 2013. doi:10.4203/ccp.101.14
Keywords: fast Fourier transform, FFT, parallel performance, PRACE, Hartree Centre.
Summary
The Fast Fourier Transform (FFT) is one of the most widely used and useful algorithms in engineering and scientific applications and therefore its analysis and performance on large-scale computing platforms is of much importance to a range of research fields. In computational fluid dynamics applications, computing fast and efficient FFTs enables ever larger direct numerical simulations and large-eddy simulations, in which Reynolds numbers can approach those found in reality. Under the European Community's Seventh Framework Programme, the PRACE [1]
`Tier-0' systems [2,3] with parallel computing environments, enabling a great deal of processing power (either through a large numbers of CPU cores or the provision of computational accelerators such as GPUs), have been made available for high-end computing researchers and code developers. Recently, high-end computing resources (IBM Blue Gene/Q) have also been made available to researchers in the UK through the Hartree Centre at STFC Daresbury Laboratory [4]. This paper analyses parallel three-dimensional FFT performance on these high-end resources using routines from the numerical libraries FFTW [5], FFTE [6] and DAFT [7]. The implementations of the FFT investigated range from pure MPI versions to hybrid MPI-OpenMP approaches that can utilize simultaneous multithreading features on multicore architectures. Alternative three-dimensional data distributions, such as slab, pencil and block are also investigated to assess the impact upon parallel performance. The paper extends former work to testing the various FFT methods for the large datasets often used in simulations involving the High-Performance Solver for Turbulence and Aeroacoustic Research (HiPSTAR), which is developed at the University of Southampton, UK [8]. The paper presents, compares and analyses performance results from benchmark runs undertaken on the three architectures listed above. The authors conclude that although new implementations and techniques can now extend performance scalability to several thousands of cores, parallel scalability is ultimately limited by the all-to-all nature of the underlying communications.
References
purchase the full-text of this paper (price £20)
go to the previous paper |
|