Computational & Technology Resources
an online resource for computational,
engineering & technology publications |
|
Civil-Comp Proceedings
ISSN 1759-3433 CCP: 84
PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY Edited by: B.H.V. Topping, G. Montero and R. Montenegro
Paper 117
Parallel Linear Solution of Large Structures on Heterogeneous PC Clusters O. Kurc1 and K.M. Will2
1Department of Civil Engineering, Middle East Technical University, Ankara, Turkey
O. Kurc, K.M. Will, "Parallel Linear Solution of Large Structures on Heterogeneous PC Clusters", in B.H.V. Topping, G. Montero, R. Montenegro, (Editors), "Proceedings of the Fifth International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 117, 2006. doi:10.4203/ccp.84.117
Keywords: parallel, workload balancing, heterogeneous, substructuring, direct solution, structural.
Summary
Over the last two decades, extensive research effort has been devoted to the
development of parallel computing techniques for the solution of finite element
problems due to the increased availability of parallel computers [1]. In these studies,
various solution algorithms have been developed for a number of parallel
architectures but many of these studies considered systems having only
homogeneous processors. In most civil engineering offices, the existing computers
usually do not all have the same processors but are heterogeneous clusters where
each computer may have a different processor or a different computational speed. A
parallel solution framework which considers the computational characteristics of
these heterogeneous clusters will allow the engineers to perform solutions faster
without requiring the purchase of any additional hardware. Thus, the main purpose
of this paper is the presentation of a solution framework for PC clusters where each
computer may have different computational characteristics.
This paper focuses on the parallel linear solution of large systems on heterogeneous clusters. The parallel solution is performed by a substructure based solution algorithm where the substructures are condensed by an active-column fan-in solver and the interface equations are solved with the parallel variable band solver. One of the main challenges of this approach is the balanced distribution of computational loads among computers especially when the computers' computational speeds vary. At present, a partitioning approach which balances the condensation times of substructures for direct solvers does not exist [2]. In order to achieve a balance in the condensation times across a heterogeneous cluster, a data preparation (balancing) phase is utilized prior to the parallel solution. The first step of the data preparation phase is the cluster recognition where the computational speed of each computer for the condensation and interface solution algorithms is determined. Then, the workload balancing iterations are initiated. First the structure is partitioned into substructures where the number of substructures is equal to the number of computers. Then, the condensation times of each substructure are estimated and any imbalance in the condensation times is adjusted by transferring vertices from the substructures with slower condensation times to the faster ones. Then, the condensation times of newly formed substructures are estimated and checked to determine if they are balanced. If there is still an imbalance in the estimated condensation times, the vertex transfers are repeated. This iterative process is continued until a desired balance is obtained or the maximum number of iterations is reached. All of these computations are performed in parallel. As the iterations are finalized, all partitioning results created during the iterations are scanned and the one that provided the best condensation time estimate is chosen for the solution. The final step is the generation of nodes and elements of the substructures from partitions. During this process, the interface elements whose nodes are on two or more substructures are assigned to one of their adjacent substructures. Then, the final condensation time estimates of substructures are computed and each substructure is assigned to a computer in such a way that the condensation time imbalance is minimized. Once the substructures are created, the parallel solution is initiated. Each computer assembles their stiffness matrix and condenses it to the substructure interface. Then, the rows of the interface matrix are assigned to the computers and assembled. The number of rows of the interface stiffness matrix that will be factorized by each computer is determined according to each computer's computational speed for the row-wise factorization. After computing the interface unknowns, each computer calculates the internal displacements and element stresses. To illustrate the effect of the data preparation phase, a square two-dimensional plate was modelled with shell elements. The model had 155,526 equations and solved on a heterogeneous PC cluster with eight computers. It was found that the actual condensation times of the initial substructures were highly imbalanced and the governing condensation time was equal to 98.69 seconds. On the other hand, the condensation times of final substructures created with the presented method are better balanced and the governing condensation time was decreased to 61.70 seconds. The workload balancing iterations consumed only 7.44 seconds. Various other example problems are also presented to illustrate the efficiency of this framework. The test runs were performed on an existing eight computer heterogeneous PC cluster. References
purchase the full-text of this paper (price £20)
go to the previous paper |
|