NCAR, IBM and CU Research Ultimate Algorithms, Computer Architecture

Steve Thomas and Amik St. Cyr, computational scientists in NCAR's Scientific Computing Division, are working with a team of collaborators from SCD, IBM, and the University of Colorado to develop HOMME—a modeling environment for atmospheric research that can use tens of thousands of processors effectively. Photo: Lynda Lester, NCAR/SCD
New numerical methods for high-performance computing being developed by the Scientific Computing Division's Computational Science Section (CSS), coupled with an innovative computer architecture from IBM called Blue Gene/L, may provide solutions to these challenges. Computer scientists and applied mathematicians in CSS are building an accurate, efficient, and scalable general circulation model called the High-Order Multiscale Modeling Environment (HOMME). HOMME employs advanced algorithms and computing techniques that will allow it use tens of thousands of processors effectively. The model is currently running at blazing speed on IBM's BlueGene/L, a densely-packed, massively parallel computer that requires a fraction of the power and space of most production systems. Crunch time "It makes for an interesting story," says CSS computational scientist Steve Thomas. "Researchers in the geosciences are saying, 'We need one hundred times the computing power within the next five years.' And we're seeing faster and faster computers—speed, of course, meaning a faster clock speed, and thus chips that generate more heat. But our computing facility is already nearing the limit—we're close to maxed out in terms of floor space and the amount of electricity and cooling we can provide to the Computer Room. "So it's crunch time, literally. We have to decide, are we going to move to a new facility, build a new facility, or make do with what we have now? This is not pie in the sky, it's not looking down the road. This is not the long-term future, it's here and now." Faced with this critical situation, Steve, Rich Loft, and Henry Tufo in CSS have been studying ways to use low-power microprocessors effectively. Because BlueGene/L has been particularly interesting in this regard, CSS, in collaboration with researchers from CU-Denver and CU-Boulder, submitted a proposal early in 2004 to the National Science Foundation's Major Research Infrastructure program. The objective was to acquire a 1,024-node BlueGene/L system to study the performance of scalable applications on it and to evaluate BlueGene/L's production capabilities. NSF has funded the proposal, and SCD is currently negotiating with IBM to obtain a BlueGene/L in Spring 2005. Running at 5.7 teraflops peak (or 2.8 teraflops in co-processor mode), the machine would outperform blackforest, NCAR's IBM SP/6000 (1.962 teraflops peak). It would also occupy a fraction of the floor space and consume far less power. More on BlueGene/L Scalability: Key to performance But to deliver the highest performance, BlueGene/L requires massively parallel applications that can scale. And while, for instance, NCAR's Community Climate Systems Model (CCSM) can scale to a few hundred processors, HOMME can scale to tens of thousands. "In 2001, we ran HOMME at Lawrence Berkeley Lab on 2,000 processors and got 400 gigaflops sustained at typical climate resolutions slightly higher than the CCSM," says Steve. "And Mark Taylor of Sandia National Laboratory recently did some benchmarking with HOMME using 9,000 processors on the ASCI Red machine. Right now we're running HOMME on 8,000 processors on the prototype BlueGene/L in Rochester, New York. We've hit 1.5 teraflops, and that's just the first pass." HOMME: A unique modeling environment for climate research HOMME is being developed by Steve Thomas, Amik St. Cyr, Henry Tufo, and John Dennis of CSS and Theron Voran, a student of Dr. Tufo at the University of Colorado. Other participants and collaborators are John Clyne and Joey Mendoza of SCD's Visualization and Enabling Technologies Section; Jim Edwards, IBM's site analyst at NCAR; and Gyan Bhanot, Bob Walkup, and Andii Wyszogrodzki of IBM's T. J. Watson Research Center. HOMME is written in Fortran 90 and contains three components: a dynamical core, an atmospheric physics component, and a dynamics/physics coupler. The core. The dynamical core provides the computational foundation for solving the fluid dynamics equations necessary to study the atmosphere. It supports several different schemes for modeling spatial and temporal data. The dynamical core is based on the spectral element numerical method, which requires fewer communications between processors and runs more efficiently on a higher number of CPUs than the spherical harmonic method used in many traditional models. This method allows modelers to add more grid points over interesting geographic areas or to resolve important aspects of the flow being modeled. This process, called "adaptive mesh refinement," permits higher spatial resolutions in selected areas. The core also employs a new approach to temporal discretization (i.e., breaking up data into time periods), allowing modelers to take longer time steps. The approach—a combination of semi-implicit with semi-Lagrangian time-stepping—potentially more than doubles the integration rate, or the speed at which a day of climate can be simulated. It also enhances parallelization for new computer architectures such as BlueGene/L. However, in order for the dynamical core to be fully useful for atmospheric scientists, it must be coupled to physics packages employed by the community. The physics. CSS is now integrating physics from NCAR's Community Atmosphere Model (CAM) into HOMME. This adds the ability to model moisture and its profound effects on the atmosphere—or instance, how clouds interact with radiation from the to sun affect land, oceans, and ice. Modelers generally simulate cloud formation using crude parameters, since directly simulating cloud processes on a global scale requires a massive increase in computational power. A technique called super-parameterization, which improves the simulation of cloud processes, is not often used because it is two to three orders of magnitude more computationally intensive than traditional techniques. However, with the advent of BlueGene/L, super-parameterization becomes a reasonable option. CSS and their collaborators have built a super-parameterization package and are currently coupling it to HOMME. "A lot has been done this year in terms of adding more realistic physics and physical processes to HOMME," says Steve. "We're beyond Physics 101. Hopefully within the next year, we'll have a full climate model." The coupler. CSS is working with IBM's Jim Edwards to use the Earth System Modeling Framework (ESMF) to couple HOMME's dynamical core to the physics component. ESMF is a software infrastructure that allows different weather, climate, and data-assimilation components to operate together on parallel supercomputers. The ESMF project is an interagency collaboration, with its core implementation team based in CSS. Ultimate algorithms, ultimate architecture The work with HOMME and BlueGene/L is part of CSS's mission to track computer technology, extract performance from it, and pioneer new and efficient numerical methods. The result will be an atmospheric model capable of exploiting BlueGene/L's scalability and computational power—and advancing NCAR's research agenda by leaps and bounds. As CSS scientist Amik St. Cyr puts it, "We're researching the ultimate numerical algorithms tied to the ultimate architecture for producing science faster." — Lynda Lester