PSC Provides Major Support for Unprecedented Storm-Forecast Experiment

PSC improved performance of the forecast model, automated the daily runs and coordinated a dedicated high-bandwidth link: Spring on the Great Plains brings one of nature's most awesome performances of fierce weather, as the May 5th weekend in Kansas tragically demonstrated. Many residents of Greensburg, Kansas credited the National Weather Service, which gave a half-hour advance warning, with preventing an even worse disaster. Nevertheless, tornados are notoriously hard to predict, and better warnings -- hours in advance, instead of minutes, with greater reliability in the prediction -- could save countless lives. To that end, NOAA (the National Oceanic and Atmospheric Administration) this spring has mounted an unprecedented experiment in forecasting severe storms. To support it the Pittsburgh Supercomputing Center (PSC) has brought to bear an awesome array of technology, its Cray XT3 -- a lead system of the National Science Foundation (NSF) TeraGrid -- and a dedicated high-bandwidth network link between Pittsburgh and Oklahoma contributed by Cisco Systems, Inc. A major goal of the 2007 NOAA Hazardous Weather Testbed (HWT) Spring Experiment is to assess how well "ensemble" forecasting -- a very computationally demanding approach -- works to predict thunderstorms, including the "supercells" that spawn tornados. It is the first time ensemble forecasts, multiple runs of the same forecast model (to measure the uncertainty inherent in weather forecasts), are being carried out at the spatial resolution at which storms occur (finer than operational forecasts, thereby requiring more computing). It is also the first time ensemble forecasts are being carried out in real time in an operational forecast environment. "Ensembles have been used extensively in larger-scale models," said Steven Weiss, Science and Operations Officer of the NOAA Storm Prediction Center (SPC) in Norman, Oklahoma. "But they have never before been used at the scale of storms. This is unique -- both in terms of the forecast methodology and the enormous amount of computing. The technological logistics to make this happen are nothing short of amazing." Collaborators in the experiment, in addition to PSC and SPC, are the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma, Norman; the NOAA National Severe Storms Laboratory in Norman; LEAD (Linked Environments for Atmospheric Discovery), an NSF Large Information Technology Research grant program and TeraGrid Science Gateway; and the National Center for Supercomputing Applications (NCSA) in Illinois, a lead TeraGrid resource provider. To implement CAPS' daily forecast runs on PSC's Cray XT3 using the WRF (Weather Research and Forecast) model, PSC provided technological and staff assistance at several levels:
  • PSC networking staff coordinated with OneNet, a regional network of the State of Oklahoma, and National Lambda Rail (NLR), a network initiative of U.S. universities, and with Cisco Systems, who contributed use of a dedicated "lambda" (a 10-gigabit-per-second optical-network) for up to a 12-month period.
  • PSC implemented the lambda at its end in January, using existing equipment in the Pittsburgh metro and local-area network. The backbone is provided by NLR and OneNet provides the link from Tulsa to Norman, Oklahoma.
  • This dedicated link -- from the Cray XT3 to OneNet in Tulsa to a supercomputer at the University of Oklahoma (which ingests and post-processes the data) -- makes possible the transfer of 2.6 terabytes of data per forecast day.
  • PSC staff optimized the latest version of the WRF model to run on the Cray XT3, gaining a threefold speedup in input/output (I/O) of the WRF code, substantially improving overall performance.
  • PSC also optimized the I/O for post-processing routines used to visualize and analyze the forecast output, achieving 100-fold speedup.
  • PSC modified the reservation and job-processing logic of its job-scheduling software to implement auto-scheduling of WRF runs and related post-processing, 760 separate jobs each day, demonstrating the TeraGrid's ability to use the Cray XT3, a very large "capability" resource, on a scheduled, real-time basis.

The NOAA HWT spring experiment forecasts require more than a hundred times more computing daily than the most sophisticated National Weather Service operational forecasts. To meet this need, PSC's Cray XT3 (2,068 2.6 GHz dual- core processors, 21 teraflops peak) is the most powerful "tightly-coupled" system (designed to optimize inter-processor communication) available via the TeraGrid. Each night, from April 15 until June 1, CAPS transmits weather data to the Cray XT3, which runs a 10-member ensemble (10 runs of the model) in addition to a single higher-resolution WRF run, in time to produce a forecast for the next day by morning. The forecast domain extends from the Rockies to the East coast, two-thirds of the continental United States. The ensemble runs are at four-kilometer horizontal resolution, with the single WRF forecast at two kilometers. A scientific objective is to assess the value of ensemble forecasts in relation to the higher-resolution forecast, and the XT3 and the high-bandwidth link to Oklahoma make it possible to do both of these demanding runs daily under real-time constraints. Along with ensemble forecasts and use of the Cray XT3, the dedicated lambda is also unprecedented. "There's no other traffic on this lambda," said Wendy Huntoon, PSC director of networking. "This is probably the first time a lambda has been dedicated to a single research effort. All of us involved in this experiment are grateful to Cisco." Huntoon, who is also director of operations for NLR, helped to coordinate among OneNet, NLR and Cisco to implement the contributed lambda. "The forecast runs at Pittsburgh ship terabytes of data back to Oklahoma every day," said Ming Xue, director of CAPS. "It wouldn't be possible without this network connection." "This experiment represents an enormous leap forward," said University of Oklahoma meteorologist Kelvin Droegemeier, who directs LEAD and, as former director of CAPS, has led several spring forecasting experiments over the past decade. "Ensembles open up a new array of interpretative capabilities to forecasters analyzing how good the forecast is. With ensembles, you're not only forecasting the weather, you're forecasting the accuracy of the forecast." Other parts of the experiment use capabilities developed by LEAD to test "on-demand" forecasts. These forecasts, run in response to continental U.S. forecasts that predict severe storms, are at fine spacing (two km) over smaller domains where initial forecasts indicate high storm likelihood. They use TeraGrid computing resources at NCSA. Since the mid-90s, PSC has collaborated with CAPS and NOAA in spring experiments, and with steady advances in computational technology helped to achieve corresponding advances in the ability to predict storm-scale weather. In the last major experiment, during the 2005 season, using PSC's LeMieux, the first terascale system available via the TeraGrid, CAPS and NOAA learned that with sufficient high-resolution it's possible, in some cases, to predict the details of thunderstorms 24-hours in advance, a milestone in storm forecasting, suggesting that weather at this scale is inherently more predictable than previously thought. More information at its Web site.