Computational & Technology Resources
an online resource for computational,
engineering & technology publications |
|
Civil-Comp Proceedings
ISSN 1759-3433 CCP: 107
PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by:
Paper 30
Mesh Renumbering Methods and Performance with OpenMP/MPI in Code Saturne P. Trespeuch1, Y. Fournier2, C. Evangelinos3 and P. Vezolle4
1CS Information Systems, Le Plessis Robinson, France
P. Trespeuch, Y. Fournier, C. Evangelinos, P. Vezolle, "Mesh Renumbering Methods and Performance with OpenMP/MPI in Code Saturne", in , (Editors), "Proceedings of the Fourth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 30, 2015. doi:10.4203/ccp.107.30
Keywords: computational fluid dynamics, Code_Saturne, OpenMP, sparse matrix vector product, benchmarks, renumbering algorithms.
Summary
The scale of computational fluid dynamics (CFD) simulation problems is rapidly
increasing as a result of the requirements for higher spatial resolution, varied
turbulence models, and more detailed physics. As is the case with many CFD
Navier-Stokes tools, EDF's Code_Saturne which is also one of the two CFD
software packages of the PRACE benchmark, is parallelized using domain
partitioning and MPI. On large systems with thousands of compute nodes, even with
simulations employing multi-billion cell meshes a pure MPI approach will not able
to fully take advantage of the multiple levels of parallelism and the steady increase
in the number of cores per processor. To tackle this problem the most popular
approach is to introduce a hybrid MPI/OpenMP approach. Code_Saturne
implements a three-dimensional general finite volume solver with conformal and
non-conformal meshes. The computation time is dominated by the linear equation
solvers, mainly for the pressure and to a lesser degree by gradient reconstructions.
The thread-level parallelism was mainly applied on computational loops which
iterate over the cells or faces in the cell-centred formulation. A general loop
transformation was implemented to allow a wide range of methods to control
memory indirect addressing conflicts between threads, while minimizing code
changes. In this paper different mesh renumbering algorithms are presented to
generate threads (multipass approach with METIS, SCOTCH partitioning or space
filling Morton curves, Cuthill McKee approach), while exploiting communication
overlapping. Performance, scalability and comparison results are presented on an
Intel x86 cluster (with three generations of Intel Xeon processor: Westmere, Ivy
Bridge and Haswell) and IBM Blue Gene/Q systems. A very significant part of the
total execution time is spent in sparse matrix-vector products. It is shown that this
product can behave as a stream kernel benchmark and therefore depends on the
memory system performance. It is pointed out that significant performance
degradation occurs per core depending on the number of cores used per node.
Results on several Intel Xeon generations are provided as well as hardware counter
analysis.
purchase the full-text of this paper (price £20)
go to the previous paper |
|