Computational Technology Resources - CCP

Keywords: algorithmic differentiation, adjoint MPI, adjoint OpenMP.

Summary

Numerical simulation software is generally run on multi-core parallel architectures. This trend implies hybrid parallelization schemes consisting of both distributed and shared-memory programming models. The de facto standard for distributed memory is the message passing interface (MPI) [1]. MPI is used to decompose the workload into large chunks which are distributed onto computer nodes. Additionally, each node is composed of several cores that access the same memory locations over a common physical memory. Hence, we assume that the core numerical problem, called kernel, is distributed among the nodes through MPI. On each node the kernel is assumed to use OpenMP [2] for shared-memory parallelization.

Numerical simulation and optimization typically rely on robust and efficient derivative information. [3] The authors prefer the adjoint model resulting from the associativity of the chain rule. Algorithmic Differentiation applies this model semi-automatically by transforming a given original code into its derivative equivalent where in addition to the values, derivatives are computed. Thus, a potentially tedious implementation of the derivative code by hand is avoided.

No existing AD tool is able to generate the derivative code of a hybrid parallel implementation automatically. In this paper this is achieved by using both categories of tools (source transformation and overloading) to implement the adjoint derivative model. At runtime, crucial information for adjoining OpenMP pragmas is missing. Therefore only a source transformation tool (e.g. compiler) parsing these pragmas, is able to adjoin OpenMP code. Moreover, parsing the entire code using an AD tool is a difficult task such that no tool has ever completely achieved or even strived for, since the additional effort far outweighs the benefits. As MPI resides mostly on a higher layer of an application, this is in particular true for adjoint MPI. Hence an overloading AD tool is used for adjoining MPI.

To motivate and illustrate the authors approach, a distributed dense matrix multiplication based on the Cannon algorithm [4] is implemented, serving as an emulation of large-scale simulation codes, covering both the distribution of the input problem using MPI as well as a local computation of a kernel using OpenMP.

References

1: W. Gropp, E. Lusk, A. Skjellum, "Using MPI: Portable Parallel Programming with the Message Passing Interface", MIT Press, 1994.
2: OpenMP Architecture Review Board, "OpenMP Application Program Interface", Specification, 2008.
3: A. Griewank, A. Walter, "Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation", 2nd Edition, SIAM, Philadelphia, 2008.
4: L.E. Cannon, "A Cellular Computer to implement the Kalman Filter Algorithm", 1969.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £50 +P&P)

	Computational & Technology Resources an online resource for computational, engineering & technology publications
	not logged in - login
Front Page Browse CCP CSETS CTR IJRT Other Authors Search Purchase Guide FAQ Contact us	Civil-Comp Proceedings ISSN 1759-3433 CCP: 100 PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY Edited by: B.H.V. Topping Paper 7 Adjoining Hybrid Parallel Code M. Schanen, M. Foerster, J. Lotz, K. Leppkes and U. Naumann LuFG Informatik 12, Software and Tools for Computational Engineering, RWTH Aachen University, Germany doi:10.4203/ccp.100.7 purchase the full-text of this paper Full Bibliographic Reference for this paper M. Schanen, M. Foerster, J. Lotz, K. Leppkes, U. Naumann, "Adjoining Hybrid Parallel Code", in B.H.V. Topping, (Editor), "Proceedings of the Eighth International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 7, 2012. doi:10.4203/ccp.100.7 Keywords: algorithmic differentiation, adjoint MPI, adjoint OpenMP. Summary Numerical simulation software is generally run on multi-core parallel architectures. This trend implies hybrid parallelization schemes consisting of both distributed and shared-memory programming models. The de facto standard for distributed memory is the message passing interface (MPI) [1]. MPI is used to decompose the workload into large chunks which are distributed onto computer nodes. Additionally, each node is composed of several cores that access the same memory locations over a common physical memory. Hence, we assume that the core numerical problem, called kernel, is distributed among the nodes through MPI. On each node the kernel is assumed to use OpenMP [2] for shared-memory parallelization. Numerical simulation and optimization typically rely on robust and efficient derivative information. [3] The authors prefer the adjoint model resulting from the associativity of the chain rule. Algorithmic Differentiation applies this model semi-automatically by transforming a given original code into its derivative equivalent where in addition to the values, derivatives are computed. Thus, a potentially tedious implementation of the derivative code by hand is avoided. No existing AD tool is able to generate the derivative code of a hybrid parallel implementation automatically. In this paper this is achieved by using both categories of tools (source transformation and overloading) to implement the adjoint derivative model. At runtime, crucial information for adjoining OpenMP pragmas is missing. Therefore only a source transformation tool (e.g. compiler) parsing these pragmas, is able to adjoin OpenMP code. Moreover, parsing the entire code using an AD tool is a difficult task such that no tool has ever completely achieved or even strived for, since the additional effort far outweighs the benefits. As MPI resides mostly on a higher layer of an application, this is in particular true for adjoint MPI. Hence an overloading AD tool is used for adjoining MPI. To motivate and illustrate the authors approach, a distributed dense matrix multiplication based on the Cannon algorithm [4] is implemented, serving as an emulation of large-scale simulation codes, covering both the distribution of the input problem using MPI as well as a local computation of a kernel using OpenMP. References 1 W. Gropp, E. Lusk, A. Skjellum, "Using MPI: Portable Parallel Programming with the Message Passing Interface", MIT Press, 1994. 2 OpenMP Architecture Review Board, "OpenMP Application Program Interface", Specification, 2008. 3 A. Griewank, A. Walter, "Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation", 2nd Edition, SIAM, Philadelphia, 2008. 4 L.E. Cannon, "A Cellular Computer to implement the Kalman Filter Algorithm", 1969. purchase the full-text of this paper (price £20) go to the previous paper go to the next paper return to the table of contents return to the book description purchase this book (price £50 +P&P)
Back to top	©Civil-Comp Limited 2023 - terms & conditions