Computational & Technology Resources
an online resource for computational,
engineering & technology publications |
|
Civil-Comp Proceedings
ISSN 1759-3433 CCP: 95
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by:
Paper 24
GPU-Based Parallel Nonlinear Conjugate Gradient Algorithms V. Galiano1, H. Migallón1, V. Migallón2 and J. Penadés2
1Department of Physics and Computer Architectures, University Miguel Hernández, Elche, Alicante, Spain
, "GPU-Based Parallel Nonlinear Conjugate Gradient Algorithms", in , (Editors), "Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 24, 2011. doi:10.4203/ccp.95.24
Keywords: GPGPU, GPU libraries, multicore, nonlinear conjugate gradient algorithms, parallel preconditioners, ILU factorizations, two-stage methods, Bratu problem.
Summary
The algorithms described here have been implemented using an Intel Core 2 Quad Q6600 and an NVIDIA GTX 280 GPU. We display the numerical results obtained using CUDA over the GPU and we compare these results with those obtained on the shared memory platform using an OpenMP model. Furthermore, a mixed model is considered in order to exploit the characteristics of both parallel systems. The reported numerical experiments analyze the behavior of these algorithms working in a fine grain parallel environment compared with a thread-based environment. We have analyzed the proposed algorithms in order to identify the main operations, and we have implemented some optimizations and tested some libraries in order to perform these operations optimally. CUBLAS and CUSPARSE libraries offer a good performance, and the sparse matrix format should be chosen according to the parallel architecture, being ELLPACK-R [3] the most efficient format. On the other hand, we have shown differences in adaptation of both methods to the fine grain GPU architecture. We would like to point out that the use of the GPU improves the results obtained using any of the proposed methods. Moreover, the NLCG method exploits better the parallelism offered by the GPU than the NLPCG method. References
purchase the full-text of this paper (price £20)
go to the previous paper |
|