Innovative Computing Technique Earns Livermore Team the Gordon Bell Prize

Using ground breaking computational techniques, a team of scientists from Lawrence Livermore National Laboratory and IBM earned the 2007 Gordon Bell Prize with a first-of-a-kind simulation of Kelvin-Helmholtz instability in molten metals on BlueGene/L, the world’s fastest supercomputer. Sponsored by ACM and the IEEE Computer Society, SC07 showcases the latest advances high performance computing, networking, storage and analysis. SCinet serves as the platform for SC07 exhibitors to demonstrate advanced computing resources from their home institutions and elsewhere by supporting supercomputing and grid computing applications. By performing extremely large-scale molecular dynamics simulations, the team was able to study, for the first time, how a Kelvin-Helmholtz instability develops from atomic scale fluctuations into micron-scale vortices. “This has never been done before. We were able to observe this atom by atom. There was no time scale or length scale we couldn’t see,” said Jim Glosli, lead author on the winning entry titled “Extending Stability Beyond CPU Millennium: A Micron-Scale Simulation of Kelvin-Helmholtz Instability.” Other team members were: Kyle Caspersen, David Richards, Robert Rudd and project leader Fred Streitz of LLNL; and John Gunnels of IBM. The Kelvin-Helmholtz instability arises at the interface of fluids in shear flow and results in the formation of waves and vortices. Waves formed by Kelvin-Helmholtz (KH) instability are found in all manner of natural phenomena, such as waves on a windblown ocean, sand dunes and swirling cloud billows. While Kelvin-Helmholtz instability has been thoroughly studied for years and its behavior is well understood at themacro-scale, scientists did not clearly understand how it evolves at the atomic scale until now. The insights gained through simulation of this phenomenon areof interest to the National Nuclear Security Administration’s (NNSA) Stockpile Stewardship Program, the effort to ensure the safety security and reliability of the nation’s nuclear deterrent without nuclear testing. Understanding how matter transitions from a continuous medium at macroscopic length scales to a discrete atomistic medium at the nanoscale has important implications for such Laboratory research efforts as National Ignition Facility (NIF) laser fusion experiments and developing applications for nanotube technology. “This was an important simulation for exploring the atomic origins of hydrodynamic phenomena, and hydrodynamics is at the heart of what we do at the Laboratory,” Glosli said. “We were trying to answer the question: how does the atomic scale feed into the hydrodynamic scale.” "This remarkable Kelvin-Helmholtz simulation breaks new ground in physics and in high performance scientific computing," said Dona Crawford, associate director for Computation at Lawrence Livermore National Laboratory. "A hallmark of the Advanced Simulation and Computing program is delivering cutting edge science for national security and the computing that makes it possible." This simulation of unprecedented resolution was made possible by the innovative computational technique used ­–a technique that could change the way high performance scientific computing is conducted. Traditionally the hardware errors or failures that are an inevitable part of HPC have been handled by the hardware itself or the operating system. This strategy was perfectly adequate for 1,000 to 10,000 processor supercomputing systems. However, these traditional approaches don’t work as well on a massively parallel machine the size of BG/L with over 200,000 CPUs (central processing units) -- almost 10 times more than on any other system. With such a large number of processors and components, hardware failures are almost certain during long production runs. Hardware failures impact system performance and consume valuable time on the machine. In partnership with IBM, the Livermore team pioneered a new strategy for recovering from hardware failure. They developed a way to use the application itself to help correct errors and failures. Their reasoning was that the application, which has a complete understanding of the calculation being run, can evaluate the errors and decide the most efficient strategy for recovery. For example, by implementing a strategy to mitigate cache memory faults (which are the primary cause of failure in BG/L), the team was able to run without error for CPU-millennia. “Applications with this capability could potentially lead to a new paradigm in supercomputer design,” said Streitz, noting that application-assisted failure recovery reduces hardware reliability constraints, opening the way for supercomputer designs using less stable but higher performing – and perhaps less expensive –components. “That concept may allow the building of a faster machine.” Named for one of the founders of supercomputing, the prestigious Gordon Bell Prize is awarded to innovators who advance high-performance computing. The award is widely regarded as the Oscars of supercomputing. A Livermore team led by Streitz won the 2005 Gordon Bell Prize for a simulation investigating the solidification in tantalum and uranium at extreme temperatures and pressure, with simulations ranging insize from 64,000 atoms to 524 million atoms. This year on the expanded machine, the Livermore team was able to conduct simulations of up to 62.5 billion atoms. “The scale of thisKelvin-Helmholtz simulation was enormous compared to the previous simulations,”Streitz said. “We were really pushing the limits of what is currently possible on this machine.”