IBM's General Parallel File System Vital to Future Grid Implementations

A large-scale grid environment built using IBM's General Parallel File System (GPFS) was named the Best Commercial Application in the Fourth Annual High Performance Bandwidth Challenge at this week's Supercomputing Conference (SC2003), demonstrating the important role that GPFS will play as major grid installations are deployed around the world in industry and commercial environments. With GPFS, its now possible to rapidly transport huge amounts of data around the world. The Bandwidth Challenge asks contestants from science and engineering research communities to demonstrate emerging techniques or applications that consume large amounts of network resources. IBM and the San Diego Supercomputing Center (SDSC) won this year using GPFS in a large-scale grid environment spanning several sites and long instances. Using GPFS, each machine in the distributed system has the same view of the file systems and can access the same files simultaneously across the TeraGrid Wide Area Network. "GPFS is indispensable as major grid installations are deployed in growing numbers," said Dave Turek, vice president, IBM Deep Computing. "This win has positive implications for Grid deployments in all environments." "We were extremely pleased with the performance achieved in this distributed file system, which we believe heralds a new paradigm for grid computing, said Phil Andrews, Director of High Performance Computing, San Diego Supercomputing Center. "In this approach, data transfers across a wide area network are completely transparent to the user avoiding any changes to their normal mode of operation." "This is first time we have used GPFS at multiple locations over the TeraGrid network," said Rob Pennington, director of NCSA's Computing and Data Management Directorate. "We have now proven that machines scattered across the country can be connected through a cyberinfrastructure like the TeraGrid and work as one machine. GPFS is an important component in creating this virtual machine." The General Parallel File System is high-performance shared-disk file system that provides data access from all nodes in a Linux or UNIX cluster environment. Parallel and serial applications can readily access shared files using standard UNIX file system interfaces, and the same file can be accessed concurrently from multiple nodes. GPFS provides high availability through logging and replication, and can be configured for failover from both disk and server malfunctions. Currently GPFS is deployed in clusters for applications like weather simulations, engineering design, seismic analysis, digital content creation and distribution, and financial modeling. GPFS is being implemented for computing systems at the National Center for Supercomputing Applications (NCSA) and the San Diego Supercomputing Center, both part of the TeraGrid system. The TeraGrid (http://www.teragrid.org/) is a National Science Foundation project to build and deploy the world's largest, most comprehensive, distributed infrastructure for open scientific research. At the conference, SDSC exhibited a cluster of 40 Intel Itanium 2-based IBM eServer xSeries systems, each connected to a Gigabit Ethernet LAN, which is connected via a Force10 switch over the SCinet network. During the SC2003 demonstration, GPFS was extended beyond the individual machine rooms at the two centers and using IBM servers on the TeraGrid and distributed among the SDSC and NCSA booths on the show floor, at SDSC (University of California), and at NCSA (at University of Illinois in Urbana-Champaign). The demonstration showed how TeraGrid disk servers at both centers move data across the TeraGrid network to compute nodes in the booths, where the data can then be used by scientific applications.