Speed Record Set at UIC

CHICAGO — A new milestone was reached in trans-Atlantic data transmission by researchers at the University of Illinois at Chicago (UIC) who demonstrated the practicality of transferring very large data sets over high-speed production networks. UIC's National Center for Data Mining (NCDM) and Laboratory for Advanced Computing flashed a set of astronomical data across the Atlantic at 6.8 gigabits per second—6,800 times faster than the 1 megabit per second effective speed that connects most companies to the Internet. In the Oct. 10 test run, 1.4 terabytes of astronomical data were transmitted from Chicago to Amsterdam in 30 minutes using UDT (UDP-based Data Transport), a new protocol developed by the NCDM. Moving the same amount of data using today's standard protocol for data transfers, TCP, would take 25 days. Moving large data sets over the Internet faces several hurdles: First, the network infrastructure for long distance 1 gigabit per second and 10 gigabit per second network links is still maturing and software that can use this infrastructure is just being developed. The UIC computer clusters used for the test were connected to the SURFnet network in Amsterdam and the Abilene network in Chicago. The test also demonstrated the quality and power of these, two of the world's leading research networks. In the past, high-speed data transfers of very large data sets have usually employed specialized experimental networks and used data protocols that did not allow other network traffic to share the same link. Second, today's predominant network protocol, TCP, is not effective at moving massive data over long distances. UDP, another network protocol that is also widely deployed, cannot reliably transport data (some data may be lost) and is not friendly to other flows (using it for large data transfers can starve other network traffic). Currently, efforts are under way to improve TCP, to develop new protocols to replace TCP, and/or to develop protocols on top of TCP and UDP that are effective for high performance data transport. To overcome these problems, in the past, high-speed data transfers of very large data sets have used special purpose research networks and employed specialized data protocols that in practice did not allow other network traffic to share the same link. Unlike some other protocols now being studied for high-speed data transfer, UDP-based protocols can be used over today's Internet without making changes to the network infrastructure. This demonstration not only showed that UDT was fast, but also that it was friendly and could effectively coexist with thousands of other networks connections. The demonstration is part of an ongoing international effort to find and test new ways of reliably moving massive data sets around the globe using advanced networks and new data transfer protocols. Such systems hold enormous promise for advancing scientific research, in addition to numerous commercial applications. Today, although it is becoming common for global business to have important data in different cities, it is still quite difficult to integrate this data to create a common view. "Using UDT, it is now practical for the first time to move even massive data sets over very long distances in a friendly fashion using today's networks," said Robert Grossman, NCDM director and president of Open Data Partners. UDT is being used by in research projects developing high-performance Web services, something that is required in order to scale today's Web services to large remote and distributed data sets. UDT also is used as the network transport layer in the joint University of Illinois/Northwestern project on Photonic Data Services (PDS), which is developing open source data services for next generation photonic networks, such as the OptIPuter. The OptIPuter is an example of what are sometimes called lambda grids, distributed computing infrastructures in which applications can set up their own photonic paths (lambdas) supporting data transport at gigabit-per-second speeds and higher. "Moving data at 6.8 gigabits per second across the Atlantic using UDT is an important milestone for the OptIPuter Project and brings us a bit closer to effective data management over lambda grids," said Larry Smarr, principal investigator of the OptIPuter Project and director of the California Institute for Telecommunications and Information Technology, a UC San Diego/ UC Irvine partnership. UDT is also being used as one of the layers of a UIC project called Open DMIX (for Data Mining, Data Integration, and Data Exploration), which is developing open source high performance Web services for data mining. "Using UDT and the scalable data mining and data integration Web services built on top of it may emerge as an important enabling technology for the grid computing required for next generation virtual observatories," according to Alex Szalay, Alumni Centennial Professor in the Department of Physics and Astronomy at Johns Hopkins University. The tests were made possible by support from the following manufacturers and organizations, who have generously contributed their equipment, facilities, and know-how: OMNInet, StarLight, Nortel, SARA and CANARIE. Partial funding for the tests was provided by the National Science Foundation (Grants 0129609, 9977868 and 0225642) and the University of Illinois at Chicago. The National Center for Data Mining (NCDM) at the University of Illinois at Chicago (UIC) was established in 1998 to serve as a national resource for high performance and distributed data mining. The Center sponsors research projects, facilitates standards, operates testbeds, and provides outreach. The Center is coordinating the development of the Predictive Model Markup Language (PMML), the standard for statistical and data mining models, as well as the WS-DMX Web services for data mining and data exploration standard. The NCDM also operates the Terra Wide Data Mining Testbed, a worldwide testbed for high performance and distributed data mining. See http://www.ncdm.uic.edu/. SURFnet operates and innovates the national research network in The Netherlands, to which 150 institutions in higher education and research in the Netherlands are connected. To remain in the lead SURFnet puts in a sustained effort to improve the infrastructure and to develop new applications to give users faster and better access to new Internet services. Currently SURFnet's network innovation is funded by the Dutch government via the GigaPort project. See http://www.surfnet.nl/. The OptIPuter, started in October 2002, is a five-year, $13.5 million project funded by the National Science Foundation. It will enable scientists who are generating massive amounts of data to interactively visualize, analyze and correlate their data from multiple storage sites connected to optical networks. University of California, San Diego and University of Illinois at Chicago lead the research team, with funded partners at Northwestern University, San Diego State University, the Information Sciences Institute at University of Southern California, UC Irvine and Texas A&M University, with industrial partners IBM, Sun Microsystems, Telcordia Technologies, Inc. and Chiaro Networks. See http://www.optiputer.net/.