Computational & Technology Resources
an online resource for computational,
engineering & technology publications |
|
Civil-Comp Proceedings
ISSN 1759-3433 CCP: 100
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY Edited by: B.H.V. Topping
Paper 7
Adjoining Hybrid Parallel Code M. Schanen, M. Foerster, J. Lotz, K. Leppkes and U. Naumann
LuFG Informatik 12, Software and Tools for Computational Engineering, RWTH Aachen University, Germany M. Schanen, M. Foerster, J. Lotz, K. Leppkes, U. Naumann, "Adjoining Hybrid Parallel Code", in B.H.V. Topping, (Editor), "Proceedings of the Eighth International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 7, 2012. doi:10.4203/ccp.100.7
Keywords: algorithmic differentiation, adjoint MPI, adjoint OpenMP.
Summary
Numerical simulation software is generally run on multi-core parallel
architectures. This trend implies hybrid parallelization schemes consisting of
both distributed and shared-memory programming models. The de facto standard for
distributed memory is the message passing interface (MPI) [1].
MPI is used to decompose the workload into large chunks which are distributed
onto computer nodes. Additionally, each node is composed of several cores that
access the same memory locations over a common physical memory. Hence, we assume
that the core numerical problem, called kernel, is distributed among the
nodes through MPI. On each node the kernel is assumed to use OpenMP
[2] for shared-memory parallelization.
Numerical simulation and optimization typically rely on robust and efficient derivative information. [3] The authors prefer the adjoint model resulting from the associativity of the chain rule. Algorithmic Differentiation applies this model semi-automatically by transforming a given original code into its derivative equivalent where in addition to the values, derivatives are computed. Thus, a potentially tedious implementation of the derivative code by hand is avoided. No existing AD tool is able to generate the derivative code of a hybrid parallel implementation automatically. In this paper this is achieved by using both categories of tools (source transformation and overloading) to implement the adjoint derivative model. At runtime, crucial information for adjoining OpenMP pragmas is missing. Therefore only a source transformation tool (e.g. compiler) parsing these pragmas, is able to adjoin OpenMP code. Moreover, parsing the entire code using an AD tool is a difficult task such that no tool has ever completely achieved or even strived for, since the additional effort far outweighs the benefits. As MPI resides mostly on a higher layer of an application, this is in particular true for adjoint MPI. Hence an overloading AD tool is used for adjoining MPI. To motivate and illustrate the authors approach, a distributed dense matrix multiplication based on the Cannon algorithm [4] is implemented, serving as an emulation of large-scale simulation codes, covering both the distribution of the input problem using MPI as well as a local computation of a kernel using OpenMP. References
purchase the full-text of this paper (price £20)
go to the previous paper |
|