Computational Technology Resources - CCP

Keywords: scientific workflow, workflow management, large scale, high performance communication, data streaming, coupling.

Summary

HREF="#wang:3">3,4] . The main focus of this paper is on efficient and scalable data transferring technologies for large scale scientific workflows. Large scale simulations are increasingly important in many scientific areas. Hundreds of terabytes of I/O data are generated in such workflows. Although the I/O throughput of 1.5 Gbps is mostly available on nowadays file systems, it is strongly affected by many factors, e.g. the amount of concurrently accesses. Our framework automates the streaming of intermediate data through workflows. The inter-connections among computational nodes in PC-Clusters are used instead of disk-involved file I/O.

Our system consists of a global task scheduler to map and manage the executions of inter-dependent tasks, an interaction model to manage the connections and process data streaming between tasks, a special I/O library that replaces normal I/O calls with remote procedure calls (RPCs), and a message transferring layer to enable the communications via the network. The simple-to-use feature of our framework is highlighted in the way that users are not required to modify their applications. For dynamically linked executions under Linux, I/O system calls are captured by pre-loading the system call interception layer and then replaced by implemented RPCs. For other executions, we support them by using Filesystem in USErspace (FUSE) [5] as our basic client library. FUSE intercepts system calls and invokes RPCs implemented in our special I/O library. Therefore, from in-house codes to non-accessible commercial software can be easily integrated into our framework. In addition, multiple network protocols are supported by our framework, that means, our system can be easily ported to various clusters with different networks.

Results of executing a single experiment from a bone implant workflow show that I/O-intensive applications can benefit from our framework with an improvement of I/O rate of 88%. We expect scaling and much higher throughput by execution of hundreds of thousands of experiments concurrently.

References

1: K. Ranganathan, I. Foster, "Identifying Dynamic Replication Strategies for a High-Performance Data Grid", in "Proceedings of the Second Workshop on Grid Computing", Denver, CO, USA, 2001. doi:10.1007/3-540-45644-9_8
2: H.S. Kim, I.S. Cho, H.Y. Yeom, "A Task Pipelining Framework for e-Science Workflow Management Systems", in "Eighth IEEE International Symposium on Cluster Computing and the Grid", Lyon, France, 2008. doi:10.1109/CCGRID.2008.47
3: V. Bhat et al., "High Performance Threaded Data Streaming for Large Scale Simulations", in "Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing", Pittsburgh, USA, 2004. doi:10.1109/GRID.2004.36
4: J.D. Blower, A.B. Harrison, K. Haines, "Styx Grid Services: Lightweight Middleware for Efficient Scientific Workflows", Journal Scientific Programming - Scientific Workflows, 14(3,4), 209-216, 2006.
5: "Filesystem in userspace", http://fuse.sourceforge.net

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £85 +P&P)

	Computational & Technology Resources an online resource for computational, engineering & technology publications
	not logged in - login
Front Page Browse CCP CSETS CTR IJRT Other Authors Search Purchase Guide FAQ Contact us	Civil-Comp Proceedings ISSN 1759-3433 CCP: 95 PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by: P. Iványi and B.H.V. Topping Paper 45 High Performance Communication Framework for Large Scale Workflows X. Wang¹, U. Küster¹, M. Resch¹ and E. Focht² ¹High Performance Computing Centre Stuttgart, University of Stuttgart, Germany ²Research and Development, NEC High Performance Computing Europe, Stuttgart, Germany doi:10.4203/ccp.95.45 purchase the full-text of this paper Full Bibliographic Reference for this paper X. Wang, U. Küster, M. Resch, E. Focht, "High Performance Communication Framework for Large Scale Workflows", in P. Iványi, B.H.V. Topping, (Editors), "Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 45, 2011. doi:10.4203/ccp.95.45 Keywords: scientific workflow, workflow management, large scale, high performance communication, data streaming, coupling. Summary HREF="#wang:3">3,4] . The main focus of this paper is on efficient and scalable data transferring technologies for large scale scientific workflows. Large scale simulations are increasingly important in many scientific areas. Hundreds of terabytes of I/O data are generated in such workflows. Although the I/O throughput of 1.5 Gbps is mostly available on nowadays file systems, it is strongly affected by many factors, e.g. the amount of concurrently accesses. Our framework automates the streaming of intermediate data through workflows. The inter-connections among computational nodes in PC-Clusters are used instead of disk-involved file I/O. Our system consists of a global task scheduler to map and manage the executions of inter-dependent tasks, an interaction model to manage the connections and process data streaming between tasks, a special I/O library that replaces normal I/O calls with remote procedure calls (RPCs), and a message transferring layer to enable the communications via the network. The simple-to-use feature of our framework is highlighted in the way that users are not required to modify their applications. For dynamically linked executions under Linux, I/O system calls are captured by pre-loading the system call interception layer and then replaced by implemented RPCs. For other executions, we support them by using Filesystem in USErspace (FUSE) [5] as our basic client library. FUSE intercepts system calls and invokes RPCs implemented in our special I/O library. Therefore, from in-house codes to non-accessible commercial software can be easily integrated into our framework. In addition, multiple network protocols are supported by our framework, that means, our system can be easily ported to various clusters with different networks. Results of executing a single experiment from a bone implant workflow show that I/O-intensive applications can benefit from our framework with an improvement of I/O rate of 88%. We expect scaling and much higher throughput by execution of hundreds of thousands of experiments concurrently. References 1 K. Ranganathan, I. Foster, "Identifying Dynamic Replication Strategies for a High-Performance Data Grid", in "Proceedings of the Second Workshop on Grid Computing", Denver, CO, USA, 2001. doi:10.1007/3-540-45644-9_8 2 H.S. Kim, I.S. Cho, H.Y. Yeom, "A Task Pipelining Framework for e-Science Workflow Management Systems", in "Eighth IEEE International Symposium on Cluster Computing and the Grid", Lyon, France, 2008. doi:10.1109/CCGRID.2008.47 3 V. Bhat et al., "High Performance Threaded Data Streaming for Large Scale Simulations", in "Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing", Pittsburgh, USA, 2004. doi:10.1109/GRID.2004.36 4 J.D. Blower, A.B. Harrison, K. Haines, "Styx Grid Services: Lightweight Middleware for Efficient Scientific Workflows", Journal Scientific Programming - Scientific Workflows, 14(3,4), 209-216, 2006. 5 "Filesystem in userspace", http://fuse.sourceforge.net purchase the full-text of this paper (price £20) go to the previous paper go to the next paper return to the table of contents return to the book description purchase this book (price £85 +P&P)
Back to top	©Civil-Comp Limited 2023 - terms & conditions