NASA Installs Largest Shared-Memory Supercomputer

By Susan Tellep, Director of Product Marketing, SGI -- The guys at NASA are smart. Bunch of rocket scientists. But within NASA, there’s an even more elite corps, the NASA Advanced Supercomputing (NAS) division, that handles NASA’s most daunting supercomputing tasks. The challenge lies in the fact that the types of applications addressed by NAS—things like long-range earth climate modeling or aerodynamic analysis of the next-generation space shuttle—are complex and difficult to parallelize. NAS gets its best results when its researchers can apply all of a supercomputer’s processors and memory to a problem in one big chunk, rather than attempting to break these complex problems down into smaller pieces. That task has just been made easier through teamwork with SGI, one of whose core strengths is high-performance computing. Recently, the NASA Ames Research Center at Moffett Field, California installed the world’s largest shared-memory machine, a 1,024-processor SGI Origin 3800 supercomputer. The installation doubled the previous largest installation of 512 CPUs, also from SGI. SGI has installed several 512-processor SGI Origin servers worldwide. NASA Ames had the first. “The new 1,024-processor SGI Origin 3800 supercomputer at NASA Ames will lead to faster and better development of climate models for the earth science community, government and industry,” said Bill Feiereisen, chief of the NAS facility. “We have improved our ability to merge observed data and simulation by a factor of 10 with considerably greater increases in the core climate solver. Such a substantial increase in performance allows earth scientists to complete climate simulations in days rather than months, leading to a better understanding of how human activity has changed climate patterns.” NAS pursues large shared-memory deployments so its programmers can work in an open environment, concentrating on scientific research rather than the arcane details of message-passing algorithms. NASA Ames researchers have developed their own open multilevel programming (MLP) environment to achieve up to ten-fold improvements in performance over the message-passing interface (MPI) implementations employed with architectures that do not offer shared memory. This is because NAS deals with extremely large computational databases. For example, NAS processes terabytes of weather data every day to test and improve the accuracy of its climate models. In an MPI implementation, a programmer would be required to manually decompose these models into smaller components that could be handled by independent computational nodes. Every second, these nodes would need to pass millions of pieces of information back and forth regarding the status of their piece of the problem and how it relates to the other nodes. The more interaction required between nodes, the more difficult the programming task, and the greater the performance penalty, as processors spend more time passing information and less time doing meteorology. With SGI’s shared-memory architecture, this type of decomposition isn’t necessary. Researchers apply the supercomputer’s full computational and memory capacity to a single, large model. This simplifies programming and yields greater performance for NAS. “The new techniques have demonstrated a development path that will allow us to move forward to 100-fold performance improvements over the next few years,” said Jim Taft, a NASA Ames researcher. “At these performance levels, we can begin to execute climate simulations at truly high resolution while taking advantage of the huge data streams emerging from the latest earth resources satellites.” In order to further expand NASA’s supercomputing capabilities, NASA Ames also entered into a memorandum of agreement that placed a separate 512-processors SGI Origin 3800 supercomputer at NASA Goddard. One of the collaborations between NASA Ames and NASA Goddard involves Goddard Earth Observing System 4 (GEOS 4), NASA’s next-generation climate-modeling system. “This collaboration between Goddard and Ames to acquire the latest supercomputing technology grants NASA scientists a significant new capability for understanding the intricacies of our planet’s climate system,” said Dr. Ghassem Asrar, associate administrator for the Office of Earth Sciences, NASA Headquarters, Washington, D.C. “For instance, the Goddard Institute for Space Studies has been able to complete in two months research that would have taken six months on its previous computing platform.” The Data Assimilation Office (DAO) at NASA Goddard is responsible for taking global satellite observations and converting the data into NASA’s next generation of climate models to better understand the physics of weather. “With the SGI Origin 3800 system, NASA will more than double the amount of data it ingests to 800,000 observations each day,” said Dr. Richard B. Rood, DAO senior scientist and acting director of the NASA-NOAA Joint Center for Satellite Data Assimilation. “We will also integrate assimilation systems for several satellites so that, like the real earth, the impact of one type of data will be felt by another type of data.” One of the main tasks of the new SGI Origin 3800 system at NASA Goddard will be to study man’s impact on the climate. The Goddard Institute for Space Sciences (GISS) is taking up this challenge, with an emphasis on understanding the true causes and magnitude of global warming. “This more capable computer will allow us to employ more realistic representations of the global climate systems in our attempts to understand climate change that has already occurred and to predict climate change that will occur throughout the 21st century,” said GISS Director Dr. James E. Hansen. “Our most pressing needs are to represent the full atmosphere—troposphere and stratosphere—with adequate vertical resolution and to represent the ocean with better horizontal and vertical resolution. These improvements will be possible with the SGI Origin 3800 system.” Researchers like those at NAS prove that there is no limit to the potential size of a problem. In the last five years, NAS has implemented the world’s first 128-processor, 256-processor, 512-processor, and 1,024-processor shared-memory systems. The shared-memory, MLP approach allows NAS to achieve levels of scalability and efficiency that its researchers could not achieve with MPI. No doubt about it: these NASA guys are smart.