Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Computational Science, Engineering & Technology Series
ISSN 1759-3158
CSETS: 27
TRENDS IN PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING
Edited by: P. Iványi, B.H.V. Topping
Chapter 13

Parallel Approximate Inverse Preconditioning using the Finite Difference Method: The General Purpose Graphics Processing Unit Approach

G.A. Gravvanis1, C.K. Filelis-Papadopoulos1 and K.M. Giannoutakis2

1Department of Electrical and Computer Engineering, School of Engineering, Democritus University of Thrace, Xanthi, Greece
2Centre for Research and Technology Hellas, Informatics and Telematics Institute, Thermi, Greece

Full Bibliographic Reference for this chapter
G.A. Gravvanis, C.K. Filelis-Papadopoulos, K.M. Giannoutakis, "Parallel Approximate Inverse Preconditioning using the Finite Difference Method: The General Purpose Graphics Processing Unit Approach", in P. Iványi, B.H.V. Topping, (Editors), "Trends in Parallel, Distributed, Grid and Cloud Computing for Engineering", Saxe-Coburg Publications, Stirlingshire, UK, Chapter 13, pp 291-319, 2011. doi:10.4203/csets.27.13
Keywords: sparse linear systems, parallel approximate inverses, parallel preconditioned conjugate gradient type methods, parallel computations, GPGPU, CUDA programming.

Summary
During recent decades, explicit approximate inverse preconditioning methods have been extensively used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel.

A new class of parallel computational techniques is proposed for the parallelization of the explicit approximate inverse and the explicit preconditioned conjugate gradient type method, [4,5,9], on a graphics processing unit (GPU). The proposed parallel methods have been implemented using compute unified device architecture (CUDA) developed by NVIDIA, [1,7,10].

For the parallel construction of the approximate inverse a "fish bone" computational approach is introduced, with respect to the anti-diagonal data dependency pattern, where the massively parallel environment of the GPU offers simultaneous calculation of the elements of the inverse through a pipeline scheduling assigning each inverted L-shaped block to each hardware thread in the GPU [3,6]. The inherently parallel linear operations between vectors and matrices involved in the explicit preconditioned bi-conjugate gradient schemes exhibit significant amounts of loop-level parallelism because of the matrix-vector and the vector-vector products that can lead to high performance gain on the GPU systems, specifically designed for such computations, [8].

Finally, numerical results for the performance of the explicit approximate inverse and the explicit preconditioned conjugate gradient type method for solving characteristic two-dimensional problems, using the finite difference method on a massive multiprocessor interface of the GPU, are presented. The CUDA implementation issues of the proposed method are also discussed.

References
[1]
J. Bolz, I. Farmer, E. Grinspun, P. Schörder, "Sparse matrix solvers on the GPU: Conjugate gradients and multigrid", in Proceedings of ACM SIGGRAPH 2003, ACM Transactions on Graphic, 22(3), 917-924, 2003.
[2]
D.J. Evans,E.A. Lipitakis, "A sparse LU factorization procedure for the solution of parabolic differential equations", in R.W. Lewis, K. Morgan, (Editors), "Proc. of Inter. Conf. on Num. Math in Thermal Problems", Pineridge Press, 954-966, 1979.
[3]
G.A. Gravvanis, "High Performance Inverse Preconditioning", Archives of Computational Methods in Engineering, 16(1), 77-108, 2009. doi:10.1007/s11831-008-9026-x
[4]
G.A. Gravvanis, "Explicit Approximate Inverse Preconditioning Techniques", Archives of Computational Methods in Engineering, 9(4), 371-402, 2002. doi:10.1007/BF03041466
[5]
G.A. Gravvanis, "Generalized approximate inverse preconditioning for solving non-linear elliptic boundary-value problems", I. J. Applied Mathematics, 2(11), 1363-1378, 2000.
[6]
G.A. Gravvanis, K.M. Giannoutakis, "Fast parallel finite element approximate inverses", CMES, 32(1), 35-44, 2008. doi:10.3970/cmes.2008.032.035
[7]
D.B. Kirk, W.W. Hwu, "Programming Massively Parallel Processors: A Hands-on Approach", Morgan Kaufmann Publishers-Elsevier, 2010.
[8]
J. Krüger, R. Westermann, "Linear algebra operators for GPU implementation of numerical algorithms", in "Proceedings of ACM SIGGRAPH 2003", ACM Transactions on Graphics, 22(3), 908-916, 2003.
[9]
E.A. Lipitakis, D.J. Evans, "Explicit semi-direct methods based on approximate inverse matrix techniques for solving boundary-value problems on parallel processors", Math. and Computers in Simulation, 29, 1-17, 1987. doi:10.1016/0378-4754(87)90062-0
[10]
J. Sanders, E. Kandrot, "CUDA by Example: An Introduction to General Purpose GPU Programming", Addison-Wesley Professional, 2010.

purchase the full-text of this chapter (price £20)

go to the previous chapter
go to the next chapter
return to the table of contents
return to the book description
purchase this book (price £95 +P&P)