Mass Storage System simulator gives more bang for the buck

By Lynda Lester, NCAR/SCD -- Let's imagine you have a cybersystem that's technologically complex, is used by thousands of scientists around the world, and may be key to understanding the future of the Earth — for instance, NCAR's enormous data-storage facility, the Mass Storage System (MSS). SCD's Bill Anderson has developed a Mass Storage System simulator that will help SCD give atmospheric researchers the most cost-effective data storage available.
The MSS is an archive of computational analyses and observational data used for long-range and long-term atmospheric research. The bulk of the data is generated by global climate-simulation models, mesoscale weather models, and other earth science models executed on supercomputers; but the MSS also contains irreplaceable historic records and data from satellites and field experiments. One of five data silos in the MSS, which also includes a high-speed network interface, a disk cache for temporary data storage, and robotic technology that speeds data from permanent storage to a user's computing job.
Now holding more than 2 petabytes of data, the MSS is operated and maintained by the NCAR Scientific Computing Division. SCD's ongoing mandate, faced with budget constraints and increasing user demands, is to run the system faster, better, and cheaper. Figuring out how exactly to do so has traditionally been a difficult proposition, since: * The MSS's sheer size and array of intricately connected parts make it hard to know in advance the effect of a given change * Daily production operations cannot be disrupted * A wrong decision could have negative long-term effects However, thanks to Bill Anderson, a software engineer in SCD's MSS Group, what was once an onerous challenge is now a much easier problem to solve. Taking the MSS out for a spin Bill has developed an MSS simulator that allows SCD to try out different MSS configurations — virtually. Just as wind-tunnel simulators, flight simulators, and driving simulators generate test conditions that approximate actual conditions under different circumstances, the MSS simulator models MSS components — tape drives, data silos, disk subsystems, software — to predict the impact of system changes on performance. "The MSS is complex because all the components are interdependent," says Bill. "It's hard to know how a change in one part might affect the rest of the system. For instance, you might think that adding 10 tape drives would improve response time for users — then find out the silos can't mount the tapes fast enough." John Merrill, head of SCD's MSS Group, notes that until the MSS simulator came online, the effect of hardware and software tweaks on MSS production could not be accurately quantified in advance. "For instance," he says, "it was hard to estimate the effect of adding or removing tape drives, changing the size of the disk cache or the size range of files going into the cache, or configuring the system for future workloads. But with the simulator, we can get try out various scenarios and get the metrics (such as response time, number of tape mounts, device utilization) without having to affect production." "It's a useful management tool," agrees Gene Harano, manager of SCD's High-Performance Systems Section. "It allows us to try out different configurations without actually making the changes or purchasing equipment." Not available in a store near you The MSS is not a commodity, off-the-shelf system. There has never been a single, commercially viable product that offers the services and storage capabilities required by NCAR scientists — so SCD has designed and developed the MSS to user specification over the course of two decades. Accordingly, Bill had to design the simulator from scratch as well. To do this he studied the entire MSS, from the user interface to the data paths and control processors. "I relied heavily on the expertise of other people in the group to learn how the various hardware and software components worked," Bill stresses. He also talked with vendors, read documentation, and scrutinized source code, building a conceptual model of the MSS and running dozens of validation tests. The simulator currently runs on a node of bluesky, NCAR's IBM SP Cluster supercomputer, taking about eight wallclock hours and 2 gigabytes of memory to simulate one month of MSS operation. Better service and performance for scientists "The MSS simulator has been live for about nine months," John says. "We've already used it quite a bit in determining how many tape drives we need and how large we should make our disk cache." "The simulator helps us use money more wisely," Bill adds. "It shows us how we can alter software algorithms and hardware resources in ways that improve our bang for the buck. It's designed to help us capacity-plan better, with the goal of giving users better service and performance." While the MSS has evolved over the years to satisfy growing user needs and technology shifts, the simulator is now helping SCD determine more precisely how the MSS will evolve in the future. By identifying which technologies offer the most cost-effective solutions, SCD will be able to to fulfill one of its core missions: providing optimal mass storage of data for the atmospheric sciences community. General information on NCAR's Mass Storage System is available at http://www.scd.ucar.edu/main/mss.html. — Lynda Lester Photo: Lynda Lester, NCAR/SCD