INDUSTRY
OSG and TeraGrid integration for high throughput in a CMS Tier-2 data exchange
Purdue University researchers have reached new milestones in grid interoperability through the successful integration of two Open Science Grid (OSG) sites running a scientific application over the National Science Foundation TeraGrid network.
Purdue is using this collaborative resource to routinely transfer data over a transcontinental wide-area network at rates that match those of Fermi National Accelerator Laboratory, which hosts the United States Tier-1 center for the Compact Muon Solenoid experiment, known as CMS. The high-energy physics community of more than 2,000 researchers worldwide is preparing for the CMS experiment, which will study information obtained from inside a giant particle collider called the Large Hadron Collider (LHC). The CMS experiment is designed to study the collisions of protons at a center of mass energy of 14 trillion electron-volts. The experiment aims to prove the existence of the Higgs boson, a hypothetical, massive subatomic particle with zero electric charge, the existence of which would explain the masses of the elementary particles. In one month of operation, Purdue and the University of California, San Diego, exchanged 105 terabytes of data. "This is nearly three times the volume of data as the next most active pair of sites exercising connectivity between research institutions in the United States," said Preston Smith, systems research engineer for Purdue's Rosen Center for Advanced Computing. The load testing, known as Service Challenge 4, took place in summer 2006 between research institutions preparing for the CMS experiment, which is under construction at CERN (Conseil Europeen pour la Recherche Nucleaire), the European Organization for Nuclear Research in Geneva, Switzerland, and expected to start taking data in fall 2007. The CMS experiment will produce large quantities of data at a rate of about 100 megabytes per second, requiring a new type of distributed computing. All data from the CMS detector will be distributed using a hierarchical tier structure of facilities with CERN being a Tier-0 center providing data to seven Tier-1 centers in different countries around the world, including one Tier-1 center at Fermilab. The Tier-1 centers are connected by an international grid infrastructure to 50 regional Tier-2 centers that will provide vital computing infrastructure for physics analysis and simulation. In the United States, CMS Tier-2 facilities are Purdue University; the University of California, San Diego; Caltech; the University of Nebraska, Lincoln; the University of Wisconsin; the University of Florida; and Massachusetts Institute of Technology. While each of these Tier-2 sites operates computer and storage resources on OSG production grid facilities and transfer data from Fermilab, Purdue also is one of nine partners on the TeraGrid. Thanks to cooperation with the TeraGrid leadership and the San Diego Supercomputer Center, a TeraGrid network route was established between Purdue and the University of California, San Diego, Tier-2 facility. "Up to now, we expected to see the fastest exchange between Purdue and Fermilab, but this is faster," said Norbert Neumeister, assistant professor of physics at Purdue. "It's amazing." The resources found within Tier-2 sites are vital to establishing a global research computing grid that can use resources at the world's premiere science institutions for transferring and storing vast amounts of data. "The proton-proton collisions will occur with every 25 nanoseconds, and the CMS experiment will record information at a rate of 100 megabytes per second," Neumeister said. "That adds up and produces a huge amount of data." The CMS project at Purdue and the larger LHC initiative that it is a part of are forcing computer scientists to reexamine their view of e-science as being mostly computationally intensive, said Ahmed Elmagarmid, director of the Purdue Cyber Center. "The LHC is producing more data than most other initiatives and putting ever bigger demands on the communications bandwidth," he said. "The newly achieved giga-scale communication at Purdue addresses one of the biggest needs of the scientific community that is involved in the CMS Tier-1, Tier-2 and the larger LHC initiative." Peak rates on the Purdue-University of California, San Diego, load test reached 200 megabytes per second (1.5 gigabytes per second), according to CMS file-based monitoring. "Bursts of over 4 gigabytes per second on the finer-grained TeraGrid network traffic monitoring have been observed," Smith said. "With the upcoming acquisition of new equipment, we expect to be able to surpass 4 gigabytes per second in the near future." Smith and the Purdue CMS team, along with Frank Wuerthwein and the CMS team at the University of California, San Diego, faced several technical challenges in achieving these milestones. The dCache storage systems at both Purdue and San Diego were tuned to eliminate any storage bottlenecks. Network engineers from Purdue, the University of California, San Diego, and San Diego Supercomputer Center worked together to establish a high-speed network path from West Lafayette to San Diego. Additional researchers and staff who helped achieve these levels of greater grid interoperability are David Braun of Purdue's Rosen Center for Advanced Computing, Sebastien Goasguen, previously of Purdue's Rosen Center for Advanced Computing (now at Clemson University), Thomas Hacker of the Purdue University Discovery Park Cyber Center and Dane Skow of Argonne National Laboratory. The CMS Tier-2 site at Purdue is a joint effort between Purdue's Rosen Center for Advanced Computing and the Department of Physics. Funding is provided by the National Science Foundation.