Computational & Technology Resources
an online resource for computational,
engineering & technology publications |
|
Civil-Comp Proceedings
ISSN 1759-3433 CCP: 95
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by:
Paper 57
A Graph-Grammar Based Multi-Frontal Parallel Direct Solver for One, Two and Three-Dimensional Partial Differential Equations P. Obrok and M. Paszynski
Department of Computer Science, AGH University of Science and Technology, Cracow, Poland P. Obrok, M. Paszynski, "A Graph-Grammar Based Multi-Frontal Parallel Direct Solver for One, Two and Three-Dimensional Partial Differential Equations", in , (Editors), "Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 57, 2011. doi:10.4203/ccp.95.57
Keywords: parallel multi-frontal solver, finite difference, graph grammar, trace theory, multi-core environment, memory optimization.
Summary
We also analyze a scheme for reducing the memory usage of the algorithm by racalculating parts of the elimination tree. This may be necessary for three or higher-dimensional domains, as the amount of necessary memory grows very fast in those cases. This is done by processing the elimination tree in parts corresponding to slices of the computational domain. The results from the bottom of the elimination tree are erased after some elimination steps, preserving only the topmost matrix. This process is repeated for every part of the tree. The collection of the matrices obtained is solved using the same multi-frontal method. The solver is implemented on the NVIDIA CUDA architecture and tested on a benchmark heat transfer problem. We present a comparison between the running times and memory usage for different domain dimensionalities between this solver and a state-of-the-art MUMPS solver [3]. We find that the memory usage of this algorithm is about one order of magnitude lower than of the MUMPS solver. The cost of this however is execution time which is an order of magnitude larger than for MUMPS. The current implementation uses home-made matrix operations as opposed to highly optimized BLAS routines in MUMPS, so a better speedup may be achieved by optimizing this implementation. References
purchase the full-text of this paper (price £20)
go to the previous paper |
|