One Weather Center's Vast Storage Needs

By Mike Karp, Senior analyst with Enterprise Management Associates -- Today, let's put your own data storage challenges in a new context. Consider, for example, the vast storage requirements of the European Centre for Medium-Range Weather Forecasts (ECMWF). ECMWF provides its 24 member states with daily worldwide weather forecasts and other weather-related products. Among those other products is an organized online access and retrieval facility to datasets derived from 27 years of weather forecasts and observations. It also provides extensive facilities for weather modeling research. Weather modeling is a compute-intensive and data-intensive operation, typically involving highly parallelized supercomputers that demand the highest level of I/O service. Among the people who measure this sort of thing, ECMWF is notable for using the 15th and 16th top-rated supercomputers in the world. As you might expect, huge amounts of data (around 1 terabyte per day) are generated and stored. The infrastructure supporting this data is suitably impressive: the archive includes more than 800 terabytes of near-line data, most of which is stored on tape in IBM and StorageTek robots. Some 88 tape drives are used to access the 25,000 tape cartridges where the current data resides. ECMWF has multiple generations of tape drives in its machine room, including IBM 6250, 3480, 3590-B, 3590-E and 3590-H models, and the STK 9480. The organization faces the same issues the rest of us must deal with - storing increasing amounts of data, keeping the data accessible, meeting regulatory requirements, managing back-up and restore operations, and protecting data assets. But clearly, the issues it faces are to the "nth" degree. Clearly, tape continues to be a strategically important part of the group's data provisioning mix. Why? Because tape is removable, is capable of providing infinite capacity, is transportable, is fast, and is cost-effective (no little consideration, given the enormous scope of this operation in terms of both the mass storage and the I/O requirements). But how can it cope with the increasing demands to support terabyte-per-day increases in its data responsibilities while maintaining cost-effectiveness? Like many of us, the organization is looking to do more with less, shrinking the infrastructure by consolidating onto fewer devices. The trick is to identify tape assets that will provide better access while costing less to maintain and manage. Since May, ECMWF has been testing 3592 drives in a StorageTek silo. It notes about a 3:1 improvement in throughput when compared with the IBM 3590 drives. For its purposes (it puts many 1G-byte files on a single tape) however, what could be the most significant improvement is the reduction in the required time to locate a file's first byte. Considering the data access requirements at this place, that probably will make its users happy. The 3592s are now being moved into a production environment. I'll let you know what comes of that.