Computational Technology Resources - CCP

Keywords: GPGPU, GPU libraries, multicore, nonlinear conjugate gradient algorithms, parallel preconditioners, ILU factorizations, two-stage methods, Bratu problem.

Summary

The algorithms described here have been implemented using an Intel Core 2 Quad Q6600 and an NVIDIA GTX 280 GPU. We display the numerical results obtained using CUDA over the GPU and we compare these results with those obtained on the shared memory platform using an OpenMP model. Furthermore, a mixed model is considered in order to exploit the characteristics of both parallel systems. The reported numerical experiments analyze the behavior of these algorithms working in a fine grain parallel environment compared with a thread-based environment. We have analyzed the proposed algorithms in order to identify the main operations, and we have implemented some optimizations and tested some libraries in order to perform these operations optimally. CUBLAS and CUSPARSE libraries offer a good performance, and the sparse matrix format should be chosen according to the parallel architecture, being ELLPACK-R [3] the most efficient format.

On the other hand, we have shown differences in adaptation of both methods to the fine grain GPU architecture. We would like to point out that the use of the GPU improves the results obtained using any of the proposed methods. Moreover, the NLCG method exploits better the parallelism offered by the GPU than the NLPCG method.

References

1: R. Fletcher, C. Reeves, "Function Minimization by Conjugate Gradients", The Computer Journal, 7, 149-154, 1964. doi:10.1093/comjnl/7.2.149
2: R. Bru, V. Migallón, J. Penadés, D.B. Szyld, "Parallel, Synchronous and Asynchronous Two-Stage Multisplitting Methods", Electronic Transactions on Numerical Analysis, 3, 24-38, 1995.
3: F. Vázquez, J.J. Fernández, E.M. Garzón, "A new approach for sparse matrix vector product on NVIDIA GPUs", Concurrency and Computation: Practice and experience, 2010. doi:10.1002/cpe.1658

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £85 +P&P)

	Computational & Technology Resources an online resource for computational, engineering & technology publications
	not logged in - login
Front Page Browse CCP CSETS CTR IJRT Other Authors Search Purchase Guide FAQ Contact us	Civil-Comp Proceedings ISSN 1759-3433 CCP: 95 PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by: Paper 24 GPU-Based Parallel Nonlinear Conjugate Gradient Algorithms V. Galiano¹, H. Migallón¹, V. Migallón² and J. Penadés² ¹Department of Physics and Computer Architectures, University Miguel Hernández, Elche, Alicante, Spain ²Department of Computer Science and Artificial Intelligence, University of Alicante, Spain doi:10.4203/ccp.95.24 purchase the full-text of this paper Full Bibliographic Reference for this paper , "GPU-Based Parallel Nonlinear Conjugate Gradient Algorithms", in , (Editors), "Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 24, 2011. doi:10.4203/ccp.95.24 Keywords: GPGPU, GPU libraries, multicore, nonlinear conjugate gradient algorithms, parallel preconditioners, ILU factorizations, two-stage methods, Bratu problem. Summary The algorithms described here have been implemented using an Intel Core 2 Quad Q6600 and an NVIDIA GTX 280 GPU. We display the numerical results obtained using CUDA over the GPU and we compare these results with those obtained on the shared memory platform using an OpenMP model. Furthermore, a mixed model is considered in order to exploit the characteristics of both parallel systems. The reported numerical experiments analyze the behavior of these algorithms working in a fine grain parallel environment compared with a thread-based environment. We have analyzed the proposed algorithms in order to identify the main operations, and we have implemented some optimizations and tested some libraries in order to perform these operations optimally. CUBLAS and CUSPARSE libraries offer a good performance, and the sparse matrix format should be chosen according to the parallel architecture, being ELLPACK-R [3] the most efficient format. On the other hand, we have shown differences in adaptation of both methods to the fine grain GPU architecture. We would like to point out that the use of the GPU improves the results obtained using any of the proposed methods. Moreover, the NLCG method exploits better the parallelism offered by the GPU than the NLPCG method. References 1 R. Fletcher, C. Reeves, "Function Minimization by Conjugate Gradients", The Computer Journal, 7, 149-154, 1964. doi:10.1093/comjnl/7.2.149 2 R. Bru, V. Migallón, J. Penadés, D.B. Szyld, "Parallel, Synchronous and Asynchronous Two-Stage Multisplitting Methods", Electronic Transactions on Numerical Analysis, 3, 24-38, 1995. 3 F. Vázquez, J.J. Fernández, E.M. Garzón, "A new approach for sparse matrix vector product on NVIDIA GPUs", Concurrency and Computation: Practice and experience, 2010. doi:10.1002/cpe.1658 purchase the full-text of this paper (price £20) go to the previous paper go to the next paper return to the table of contents return to the book description purchase this book (price £85 +P&P)
Back to top	©Civil-Comp Limited 2023 - terms & conditions