NSF funds supercomputer center at Los Alamos

Cat SCIENCE September 30, 2010, 6:13 pm

NSF funds supercomputer center at Los Alamos

Carnegie Mellon to provide expertise as a global leader in the field

The National Science Foundation (NSF) has announced a $10 million award to the New Mexico Consortium (NMC) at the Los Alamos National Laboratory (LANL), Carnegie Mellon University and the University of Utah to build and operate the Parallel Reconfigurable Observational Environment (PRObE), a one-of-a-kind computer systems research center.

This innovative concept utilizes decommissioned supercomputing systems from the Department of Energy (DOE) facilities to provide a very large-scale systems research capability. Targeted at both high-performance and data-intensive, or cloud, computing, the center will allow systems software researchers to have dedicated access to 1,000 computer clusters and control all application and operating system software down to and including the lowest-level hardware control systems. The PRObE center is the only one of its kind in the United States, and possibly the world.

"The need to expand research and educational opportunities for the systems research community is critical," said Garth Gibson, professor of computer science at Carnegie Mellon and respected thought leader in data storage and in data-intensive computing. "No sooner have computer systems such as LANL's Roadrunner achieved sustained petascale performance, capable of a trillion or more floating-point calculations per second, than we have recognized the need for exascale systems, which will be a thousand times faster," he said. "Designing exascale systems will be a tremendous challenge and one that will be difficult for the computer science community to meet without a resource such as PRObE."

"Computing researchers need to be able to test system-level innovations at scale," said Ed Lazowska, chair of the Computing Community Consortium and professor of computer science and engineering at the University of Washington. "This is the big gap. Nothing currently available fills it."

Academic researchers across the field recognize this need. Michael Dahlin, professor of computer science at the University of Texas at Austin, said, "Computer systems researchers need large-scale clusters to have any hope of doing much of the work we should be doing."

"About three years ago, we began to work on a way to re-utilize open/unclassified decommissioned supercomputers," said Gary Grider, co-PI and deputy division leader from LANL's High Performance Computing (HPC) Division. "We noticed that when new supercomputers are installed, there is a mad rush to get them into production with a focus on getting science applications to run quickly and well."

In the early phase of commissioning a new supercomputer a significant amount of work goes into software development. The people that develop software at the systems level only get a chance to try new things out for a relatively short period of time while new large computers are brought online. "This presents an issue," Grider said, "as there is no large-scale resource for these systems-level people to utilize for long periods of time to develop new concepts and functions."

The DOE continually decommissions large supercomputers, some of which are open/unclassified resources. These systems can be used for high-performance and data-intensive computing systems research, however funding is needed to house, power and air condition the systems and to provide systems support people.

"NSF seemed like the natural government sponsor for such a concept," Grider said. "Also, to be flexible enough to be able to support this kind of research, it seemed appropriate to have universities involved."

PRObE builds on an existing partnership between the LANL and the NMC to support educational and research collaborations with universities. Carnegie Mellon provides expertise as a leader in computer systems research. The University of Utah will adapt software developed for its network emulation testbed — Emulab — to PRObE.

The Emulab software has been developed over the past decade by the Flux Research Group, part of the School of Computing at the University of Utah. It is widely used in the systems research community: it powers over three dozen testbeds used around the world by thousands of researchers and educators.

PRObE will be the largest-scale Emulab installation to date. "We are excited to be part of the PRObE effort," said Robert Ricci of the University of Utah, "because we believe it addresses and important gap in the public research infrastructure."

"PRObE may be built from recycled supercomputers, but because the hardware is not exotic, the same hardware will support data-intensive computing," said CMU's Gibson, who led the DOE's Petascale Data Storage Institute. "In CMU's experience this hardware will be excellent at running data analytics for eScience or Internet service applications using open source software such as Hadoop. This will allow PRObE to serve both styles of large-scale computing, high-performance computing and data-intensive computing."

"It's good to see the NSF outsource the construction and support of a flexible large-scale experimental data center to an organization designed to do just that," said Margo Seltzer, a professor of engineering and applied sciences at Harvard University. "Let's not spend university research resources replicating engineering that is better done by others."

In addition to providing the large-scale systems research environment, PRObE will conduct an innovative summer school to train university students in how to build and manage very large high-performance computing environments. Selected top students from the summer school will be invited to intern at the PRObE Center and the LANL.

SCIENCE

NSF funds supercomputer center at Los Alamos

Carnegie Mellon to provide expertise as a global leader in the field

TRENDING