INDUSTRY
Universities Prepare as Physicists Plan to Pop Protons
The world's largest science experiment, a physics experiment designed to determine the nature of matter, will produce a mountain of data. And because the world's physicists cannot move to the mountain, an army of computer research scientists is preparing to move the mountain to the physicists.
Thomas Hacker, a research assistant professor in Purdue University's Discovery Park Cyber Center and with Information Technology at Purdue (ITaP), says the particle physics collider experiment taking place at the European nuclear physics facility CERN will involve scientists around the world. "Researchers usually have to be in the same location as the instrument to access to the data," Hacker says. "In this case, to bring the data to the researchers, we are building a huge scientific instrument that spans the globe to bring the data to the researchers." At universities across the United States and at other institutions around the world, teams of computer research scientists and physicists are preparing for the largest physics experiment ever. "Like an exercise session getting you ready for the big game, we've been going to the physics gym," Hacker says. "We are testing the ability of the infrastructure using simulation data. At Purdue, everyone is building and testing systems to make sure the computing infrastructure is ready when the detector comes online later this year." The collider will give protons a pop hoping to catch a glimpse of the Big Bang, or at least the subatomic particles that are thought to have last been seen at the big event 10 billion to 15 billion years ago that led to the formation of the universe. The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year. By comparison, 15 petabytes would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function. "Once this data is distributed to the physicists at the universities, they will require massive amounts of computing power and data storage in order to analyze it," Hacker says. "When the data transfer is live, we will stream data out to physicists as we quickly as we can - real time if possible." The experiment has a name only a scientist could love: the CERN CMS project. CERN is the abbreviation for the European Organization for Nuclear Research, and CMS is the abbreviation for compact muon solenoid, a type of electromagnet. CMS is an electronic detector that is searching for never-before-detected subatomic particles, especially a particle known as Higgs boson, which is a missing piece in the jigsaw puzzle of the theory of particle physics (boson is the name physicists give subatomic particles with particular properties). If discovered, it would be an entirely new type of matter. Dubbed "the God Particle" nearly a decade ago by Nobel prize-winning physicist Leon Lederman, the Higgs boson would explain why some particles have any mass at all, while others, such as photons, do not. Discovery of the Higgs boson is one of the top prizes in modern physics, and its discovery would validate the Standard Model, a theory of particles physics in place since the 1970s. Norbert Neumeister, assistant professor of physics and the principal investigator on the CMS project at Purdue, says the CMS experiment, along with a similar experiment also taking place at CERN called ATLAS, will bring new insights about the Standard Model and subatomic particles. "We believe the unprecedented energy range and sensitivity of this new particle accelerator, combined with the special capabilities of the CMS experiment, will lead to a breakthrough understanding of nature," he says. "Everybody hopes to find the Higgs particle, but the ultimate goal is to discover something new and completely unexpected." The experiments will take place in CERN's Large Hadron Collider, known as the LHC. In the United States, seven universities - known as Tier 2 sites - will receive the CMS data from Fermi National Accelerator Laboratory outside Chicago (the Tier 1 site). The data will be processed and then analyzed by university physicists. (Brookhaven National Laboratory is the Tier-1 site for the CERN ATLAS project.) Internationally there are 11 Tier-1 sites and more than 100 Tier-2 sites, although outside the United States the Tier-2 sites are organized in a different, less centralized, manner. In the United States, CMS Tier-2 facilities are Purdue; the University of California, San Diego; Caltech; University of Nebraska; University of Wisconsin; University of Florida; and Massachusetts Institute of Technology. Frank Würthwein, professor of physics at the University of California, San Diego, says that although the experiment is taking place at CERN in Geneva, the U.S. Tier-2 sites play an integral role. "The actual data analysis by physicists will take place at Tier-2 sites, so it's important that we can receive whatever data our physicists need," Würthwein says. "We will take data from CERN and push it across the worldwide networks to these seven places. They will receive it, analyze it, the whole gimbang. Once we have the data in all these places, a physicist will be able to submit jobs from their office computer, or even from a laptop in Starbucks." In tests so far, the CMS Tier-2 sites have been able to support up to 50,000 jobs per day, and the goal is to be able to support 100,000 computing jobs per day by late spring. "In an exercise last fall we were able to support 50,000 jobs, so we are getting there," Würthwein says. "The next six to nine months are going to be very hectic to get as close to good tools as we can possibly get. Putting the cyberinfrastructure together for this project is no easy feat. There's a lot of work yet to do, and a lot of people will have to do a lot of heavy lifting. This is not just pushing a few buttons." Les Robertson, leader of the LHC Computing Grid project, based in CERN, says that the entire system is designed to be as user friendly for the physicists as possible. "CERN, the Tier-1s and the Tier-2s together form a worldwide computing and data grid," Robertson says. "They are bound together by a layer of software called middleware that is designed to hide the complexity of this network from the user, and use resources at sites around the globe as effectively as possible." Much of the behind-the-scenes middleware used at the Tier-2 sites is being developed by the Open Science Grid consortium. Ruth Pordes, executive director of the Open Science Grid, says the middleware used for CMS and ATLAS is an enhancement of existing software. "The software is useful now for any scientists who need to process, store and access vast amounts of data," Pordes says. "And given the ever-growing internationalism of science, we're working with our peers to create a worldwide interoperable grid." Grid computing is essential to the success of the project, Hacker says. Purdue and UC-San Diego are the only two Tier-2 sites connected to the National Science Foundation's TeraGrid research network, and Purdue also connects to Fermilab through StarLight and Indiana's I-Light, which are both a high-speed fiberoptic networks. "An excellent network infrastructure is critical for the success of this project," Hacker says. "Purdue is involved in many networking projects focused on high-performance networking for research, such as the Teragrid and I-Light." Indiana University is playing a key role in CERN's ATLAS project, which, like the CMS project, aims to discover insights into subatomic physics and the nature of matter. "Together, the two state universities in Indiana are playing a key role in experimental physics," Hacker says. "Because of its science grid connections and computational resources, the state of Indiana is helping to lead the way at the frontier of science."