Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 100
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY
Edited by: B.H.V. Topping
Paper 2

Addressing the Problem of Data Mobility for Data-Intensive Science

W.E. Johnston, E. Dart and B. Tierney

ESnet and Lawrence Berkeley National Laboratory, Berkeley California, United States of America

Full Bibliographic Reference for this paper
W.E. Johnston, E. Dart, B. Tierney, "Addressing the Problem of Data Mobility for Data-Intensive Science", in B.H.V. Topping, (Editor), "Proceedings of the Eighth International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 2, 2012. doi:10.4203/ccp.100.2
Keywords: data-intensive science, large-scale, widely distributed systems, impact on the R&E Internet, moving massive quantities of data internationally, TCP is a "fragile workhorse", federated network testing and monitoring, high-throughput campus LANs, a new Internet architecture for data-intensive science.

Summary
A collection of science disciplines, driven by the increasing ability and sophistication of instrumentation, are or soon will be, "drowning" in data.

The LHC data analysis is highly distributed and this arises from the fact that the two major experiments (ATLAS and CMS) at the LHC each have a large collaboration community (more than 2900 scientists from 172 institutes in 37 countries work on the ATLAS experiment) that is scattered across Europe, N. America, and Asia, and this is increasingly the norm as scientific instruments get bigger and more expensive.

The LHC physics community has dealt with this problem methodically for a relatively long time. Their experience will be very useful for other data-intensive science disciplines, especially in the areas of highly distributed data management systems and data movement systems, and the high performance use of the intervening networks.

The distributed systems such as that used by ATLAS collaboration for data movement and analysis require network performance that is predictable like the other managed resources. This is needed for the smooth overall functioning of the system. This has given rise to a new network service that essentially provides a "virtual circuit" between specified end points, that had a guaranteed bandwidth, and that could be requested for some specific time interval in the future. This service is used to reliably interconnect the elements of the distributed systems.

On the other hand, the Internet is full of undetected "soft errors". While TCP works with such errors, over long distances they force it to operate at speeds 10-100 times slower than the link capacity.

Moving very large volumes of data over international distances requires that the network be error-free. Achieving error-free operation of the network requires constant testing and monitoring, and an international infrastructure has been put in place to do this.

Once the testing and monitoring were in place in the R&E infrastructure, it became apparent that the campuses and even the wide area R&E networks were not well designed for moving massive amount of data.

To address the campus problem, a "Science DMZ" part of the campus network was designed that is optimized for and serves only high-performance science applications.

The wide area problems were addressed with overlay network that isolated the science traffic from the general traffic.

This paper discusses all of these issues.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £50 +P&P)