Computational & Technology Resources
an online resource for computational,
engineering & technology publications |
|
Civil-Comp Proceedings
ISSN 1759-3433 CCP: 84
PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY Edited by: B.H.V. Topping, G. Montero and R. Montenegro
Paper 119
Solution of the Three-Dimensional Finite Difference Equations Using Parallel Reconfigurable Hardware Accelerators E.J.C. Stewart1, J. Hu1, S.F. Quigley1 and A.H.C. Chan2
1Department of Electronic, Electrical and Computer Engineering,
E.J.C. Stewart, J. Hu, S.F. Quigley, A.H.C. Chan, "Solution of the Three-Dimensional Finite Difference Equations Using Parallel Reconfigurable Hardware Accelerators", in B.H.V. Topping, G. Montero, R. Montenegro, (Editors), "Proceedings of the Fifth International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 119, 2006. doi:10.4203/ccp.84.119
Keywords: reconfigurable computing, hardware acceleration, FPGAs, finite difference method, domain decomposition.
Summary
The solution of partial differential equations using the finite difference method is a
computationally challenging problem when the number of grid points becomes
large. This is likely to be the case when a 3D domain is to be solved.
One approach to the parallel computation of solutions is to use reconfigurable hardware to create custom co-processors to accelerate the algorithm. Reconfigurable Computing is based around the use of Field Programmable Gate Arrays (FPGAs) to form co-processors that can be configured to provide custom hardware accelerators. These can exploit parallelism within the problem in a much more thorough way than can be done with uniprocessor or parallel computers. The impact of Moore's law, which has had such a profound effect on the speed and power of microprocessor chips, has also been felt by programmable logic hardware. Originally FPGAs were used only for small-scale glue logic applications. However, in recent years the complexity and speed of programmable logic has increased enormously. Modern devices can contain tens of millions of programmable logic gates operating at clock speeds of up to 200 MHz; they can also contain specialized embedded hardware resources, such as RAM blocks, hardware multipliers, and even microprocessors. FPGA co-processors have much lower cost and greater flexibility than ASIC hardware (albeit with inferior performance). For the right type of application, a reconfigurable computer can rival the expensive parallel computers that are normally used to accelerate computationally expensive algorithms. The types of problem that can benefit from reconfigurable computing are established by the properties of the FPGA. In general, FPGAs are good at tasks that have very high levels of parallelism, and use very deep computational pipelines. If these requirements are not met, then the lower clock speed of an FPGA as compared to a microprocessor means that the performance gain will be severely limited. The requirement for very high levels of parallelism means that it is important to ensure that the hardware pipelines do not become starved of data, so a high throughput of data from memory is important. This in turn means that the speed-up achievable depends strongly on the method used to solve the equations, and the domain decomposition used. This paper presents the design and evaluation of a reconfigurable computing approach to the finite difference solution of a three-dimensional Laplace equation. The approach is based on the use of an array of reconfigurable computing boards within a PC. The board type used is a Celoxica/Alpha Data RC2000 card, which consists of a Xilinx 2V6000 FPGA combined with 24 MByte of static RAM and 256 MByte of dynamic RAM. The 2V6000 FPGA contains 6 million reconfigurable system gates, plus 144 blocks of 18 kbit SRAMs, and 144 18-bit hardware multipliers. A software host uses Fourier decomposition to decompose the 3D finite difference problem into a series of 2D or 1D problems that can be downloaded into the reconfigurable computing boards. Download of one of the sub-problems is accomplished whilst the FPGA board is computing the solution of a previous sub-problem. This eliminates communication and synchronization delays between the subdomains. The approach is based on the efficient use of the embedded RAM within the FPGA as a cache to store the working set for the current sub-problem, and also to perform background upload and download of data for the previous and following sub-problem to be tackled by the FPGA. This is based on techniques of FPGA memory management developed in [1]. A variety of iterations schemes have been evaluated, including point Jacobi, Gauss-Seidel and Successive Over-relaxation (SOR). For Gauss-Seidel and SOR, data dependencies degrade performance by stalling the computational pipelines. However, use of a red-black iteration scheme removes this problem. This results in an SOR solver that for n reconfigurable computing boards within the PC, can achieve a speed-up of 50n compared to a software solution of the same algorithm on a fast PC. References
purchase the full-text of this paper (price £20)
go to the previous paper |
|