Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 95
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING
Edited by:
Paper 59

Design, Analysis, Implementation and Deployment of a High-Performance, Out-of-Core, Parallel, Dense Direct Linear Solver

B. Lizé1 and G. Sylvand2

EADS Innovation Works, Applied Mathematics,
1Suresnes, France, 2Toulouse, France

Full Bibliographic Reference for this paper
, "Design, Analysis, Implementation and Deployment of a High-Performance, Out-of-Core, Parallel, Dense Direct Linear Solver", in , (Editors), "Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 59, 2011. doi:10.4203/ccp.95.59
Keywords: boundary elements, integral equations, direct solver, out-of-core, parallel solver, scalability, scientific software architecture, high performance computing.

Summary

This solver is designed as a high performance drop-in replacement for a legacy solver. It achieves very good scalability from 1 to several hundred cores (>80% of peak performance) without any constraint on the number of nodes (such as "power of two" or "square"), is portable and takes into account the new context of HPC, by leveraging OpenMP for intra-node parallelism and MPI for inter-node communications. In this way, it is well-suited for many configurations from workstations to large clusters. Iit has been tested on up to 512 cores, although we strongly believe, and various tests show that its design can scale very well up to several thousand cores and is less dependent on the interconnect speed and latency. It is also designed to take advantage of GPUs in the future.

In this article, we provide a detailed analysis of the algorithms and data structures used to achieve a very efficient out-of-core parallel decomposition (LU and LDLT), including run-time complexity validating the design choices. We then describe the implementation, with an emphasis on the software architecture that allowed the BEM code to evolve, gain new functionality while staying competitive from a performance point of view in a changing HPC environment and to decouple the BEM layer from the solver. Benchmarks and validation methodology that led to the deployment of this solver in the ASERIS suite are shown, and we finally provide insights on how we expect this design to be well-suited to the current and future advances in HPC technology.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £85 +P&P)