Computational Technology Resources - CCP

Keywords: parallel, workload balancing, heterogeneous, substructuring, direct solution, structural.

Summary

Over the last two decades, extensive research effort has been devoted to the development of parallel computing techniques for the solution of finite element problems due to the increased availability of parallel computers [1]. In these studies, various solution algorithms have been developed for a number of parallel architectures but many of these studies considered systems having only homogeneous processors. In most civil engineering offices, the existing computers usually do not all have the same processors but are heterogeneous clusters where each computer may have a different processor or a different computational speed. A parallel solution framework which considers the computational characteristics of these heterogeneous clusters will allow the engineers to perform solutions faster without requiring the purchase of any additional hardware. Thus, the main purpose of this paper is the presentation of a solution framework for PC clusters where each computer may have different computational characteristics.

This paper focuses on the parallel linear solution of large systems on heterogeneous clusters. The parallel solution is performed by a substructure based solution algorithm where the substructures are condensed by an active-column fan-in solver and the interface equations are solved with the parallel variable band solver. One of the main challenges of this approach is the balanced distribution of computational loads among computers especially when the computers' computational speeds vary. At present, a partitioning approach which balances the condensation times of substructures for direct solvers does not exist [2]. In order to achieve a balance in the condensation times across a heterogeneous cluster, a data preparation (balancing) phase is utilized prior to the parallel solution.

The first step of the data preparation phase is the cluster recognition where the computational speed of each computer for the condensation and interface solution algorithms is determined. Then, the workload balancing iterations are initiated. First the structure is partitioned into substructures where the number of substructures is equal to the number of computers. Then, the condensation times of each substructure are estimated and any imbalance in the condensation times is adjusted by transferring vertices from the substructures with slower condensation times to the faster ones. Then, the condensation times of newly formed substructures are estimated and checked to determine if they are balanced. If there is still an imbalance in the estimated condensation times, the vertex transfers are repeated. This iterative process is continued until a desired balance is obtained or the maximum number of iterations is reached. All of these computations are performed in parallel. As the iterations are finalized, all partitioning results created during the iterations are scanned and the one that provided the best condensation time estimate is chosen for the solution. The final step is the generation of nodes and elements of the substructures from partitions. During this process, the interface elements whose nodes are on two or more substructures are assigned to one of their adjacent substructures. Then, the final condensation time estimates of substructures are computed and each substructure is assigned to a computer in such a way that the condensation time imbalance is minimized.

Once the substructures are created, the parallel solution is initiated. Each computer assembles their stiffness matrix and condenses it to the substructure interface. Then, the rows of the interface matrix are assigned to the computers and assembled. The number of rows of the interface stiffness matrix that will be factorized by each computer is determined according to each computer's computational speed for the row-wise factorization. After computing the interface unknowns, each computer calculates the internal displacements and element stresses.

To illustrate the effect of the data preparation phase, a square two-dimensional plate was modelled with shell elements. The model had 155,526 equations and solved on a heterogeneous PC cluster with eight computers. It was found that the actual condensation times of the initial substructures were highly imbalanced and the governing condensation time was equal to 98.69 seconds. On the other hand, the condensation times of final substructures created with the presented method are better balanced and the governing condensation time was decreased to 61.70 seconds. The workload balancing iterations consumed only 7.44 seconds.

Various other example problems are also presented to illustrate the efficiency of this framework. The test runs were performed on an existing eight computer heterogeneous PC cluster.

References

1: Sotelino, E.D., "Parallel Processing Techniques in Structural Engineering Applications", ASCE Journal of Structural Engineering, 29(12), 34, 1698-1706, 2003. doi:10.1061/(ASCE)0733-9445(2003)129:12(1698)
2: B. Hendrickson, "Load balancing fictions, falsehoods and fallacies", Appl.. Math. Model., 25 , 99-108, 2001. doi:10.1016/S0307-904X(00)00042-1

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £105 +P&P)

	Computational & Technology Resources an online resource for computational, engineering & technology publications
	not logged in - login
Front Page Browse CCP CSETS CTR IJRT Other Authors Search Purchase Guide FAQ Contact us	Civil-Comp Proceedings ISSN 1759-3433 CCP: 84 PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY Edited by: B.H.V. Topping, G. Montero and R. Montenegro Paper 117 Parallel Linear Solution of Large Structures on Heterogeneous PC Clusters O. Kurc¹ and K.M. Will² ¹Department of Civil Engineering, Middle East Technical University, Ankara, Turkey ²CASE Center, School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta GA, United States of America doi:10.4203/ccp.84.117 purchase the full-text of this paper Full Bibliographic Reference for this paper O. Kurc, K.M. Will, "Parallel Linear Solution of Large Structures on Heterogeneous PC Clusters", in B.H.V. Topping, G. Montero, R. Montenegro, (Editors), "Proceedings of the Fifth International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 117, 2006. doi:10.4203/ccp.84.117 Keywords: parallel, workload balancing, heterogeneous, substructuring, direct solution, structural. Summary Over the last two decades, extensive research effort has been devoted to the development of parallel computing techniques for the solution of finite element problems due to the increased availability of parallel computers [1]. In these studies, various solution algorithms have been developed for a number of parallel architectures but many of these studies considered systems having only homogeneous processors. In most civil engineering offices, the existing computers usually do not all have the same processors but are heterogeneous clusters where each computer may have a different processor or a different computational speed. A parallel solution framework which considers the computational characteristics of these heterogeneous clusters will allow the engineers to perform solutions faster without requiring the purchase of any additional hardware. Thus, the main purpose of this paper is the presentation of a solution framework for PC clusters where each computer may have different computational characteristics. This paper focuses on the parallel linear solution of large systems on heterogeneous clusters. The parallel solution is performed by a substructure based solution algorithm where the substructures are condensed by an active-column fan-in solver and the interface equations are solved with the parallel variable band solver. One of the main challenges of this approach is the balanced distribution of computational loads among computers especially when the computers' computational speeds vary. At present, a partitioning approach which balances the condensation times of substructures for direct solvers does not exist [2]. In order to achieve a balance in the condensation times across a heterogeneous cluster, a data preparation (balancing) phase is utilized prior to the parallel solution. The first step of the data preparation phase is the cluster recognition where the computational speed of each computer for the condensation and interface solution algorithms is determined. Then, the workload balancing iterations are initiated. First the structure is partitioned into substructures where the number of substructures is equal to the number of computers. Then, the condensation times of each substructure are estimated and any imbalance in the condensation times is adjusted by transferring vertices from the substructures with slower condensation times to the faster ones. Then, the condensation times of newly formed substructures are estimated and checked to determine if they are balanced. If there is still an imbalance in the estimated condensation times, the vertex transfers are repeated. This iterative process is continued until a desired balance is obtained or the maximum number of iterations is reached. All of these computations are performed in parallel. As the iterations are finalized, all partitioning results created during the iterations are scanned and the one that provided the best condensation time estimate is chosen for the solution. The final step is the generation of nodes and elements of the substructures from partitions. During this process, the interface elements whose nodes are on two or more substructures are assigned to one of their adjacent substructures. Then, the final condensation time estimates of substructures are computed and each substructure is assigned to a computer in such a way that the condensation time imbalance is minimized. Once the substructures are created, the parallel solution is initiated. Each computer assembles their stiffness matrix and condenses it to the substructure interface. Then, the rows of the interface matrix are assigned to the computers and assembled. The number of rows of the interface stiffness matrix that will be factorized by each computer is determined according to each computer's computational speed for the row-wise factorization. After computing the interface unknowns, each computer calculates the internal displacements and element stresses. To illustrate the effect of the data preparation phase, a square two-dimensional plate was modelled with shell elements. The model had 155,526 equations and solved on a heterogeneous PC cluster with eight computers. It was found that the actual condensation times of the initial substructures were highly imbalanced and the governing condensation time was equal to 98.69 seconds. On the other hand, the condensation times of final substructures created with the presented method are better balanced and the governing condensation time was decreased to 61.70 seconds. The workload balancing iterations consumed only 7.44 seconds. Various other example problems are also presented to illustrate the efficiency of this framework. The test runs were performed on an existing eight computer heterogeneous PC cluster. References 1 Sotelino, E.D., "Parallel Processing Techniques in Structural Engineering Applications", ASCE Journal of Structural Engineering, 29(12), 34, 1698-1706, 2003. doi:10.1061/(ASCE)0733-9445(2003)129:12(1698) 2 B. Hendrickson, "Load balancing fictions, falsehoods and fallacies", Appl.. Math. Model., 25 , 99-108, 2001. doi:10.1016/S0307-904X(00)00042-1 purchase the full-text of this paper (price £20) go to the previous paper go to the next paper return to the table of contents return to the book description purchase this book (price £105 +P&P)
Back to top	©Civil-Comp Limited 2023 - terms & conditions