SDSC & IBM Team Wins SCinet Network Bandwidth & StorCloud Challenges

A team of high-performance computing engineers from the San Diego Supercomputer Center (SDSC) and IBM demonstrated expert management of large-scale data resources using innovative cyberinfrastructure tools at the 2004 Supercomputing Conference in Pittsburgh, Penn. Using StorCloud SAN-attached storage and the General Parallel File System (GPFS) from IBM, along with computation and visualization resources at various TeraGrid sites, a new computation and visualization was displayed to attendees at the conference. With these tools, Enzo scientists were able to see the process of massive star formation and destruction. “To achieve the promise of grid computing, high-performance computing applications need coordinated access to the set of resources that comprise cyberinfrastructure – superior compute platforms, on-demand remote data access, visualization tools and access to archival storage,” said Dr. Fran Berman, director of SDSC. “The TeraGrid cyberinfrastructure offers these distinctive resources to high-performance applications.” The SDSC/IBM team was awarded with the highest achieved StorCloud Bandwidth and I/Os per second for the Enzo submission. As part of the submission, the team also broke a world record by sorting a terabyte of random data in 487 seconds (8 min, 7 sec), more than twice as fast as the previous record (1,057 seconds; 17 min, 37 sec). The bandwidth achieved was 15 GB per second. The team also received the Best Spirit of the SCinet Bandwidth Challenge Award for enabling a scientific application to achieve 27 Gb per second over the TeraGrid network, utilizing more than 95 percent of the available bandwidth. This computation illustrates how a scientist can schedule a computation and visualization in automatic succession at different sites using the Grid Universal Remote metascheduler without moving any files from one site to another. A global parallel file system that spans sites allows data to be shared without duplicating the hardware and data at each site, which makes a cost effective, high performance solution for partner sites. No matter where users go throughout the grid, the files are available at any site mounting the file system. Also demonstrated was an important component of cyberinfrastructure. Using the Grid Universal Remote developed by SDSC team members, engineers were able to reserve resources across distributed sites in a coordinated fashion. User-settable reservations at SDSC and Purdue University provided the framework to make this possible. The Grid Universal Remote allows users direct access to local cluster scheduling, within policy limits. Previously, this was only possible with manual intervention by system administrators. “Our vision is to provide scientists with an easy-to-use, seamless environment that allows them to utilize all the unique distributed resources available on the grid,” says Berman. “The TeraGrid team really stepped up to the place on this challenge, providing an unprecedented level of team technology coordination.” Resources used included 120 TB of IBM TotalStorage DS4000 (FAStT) storage systems as well as 80 processors serving out storage and data from the showroom floor to NCSA and SDSC. Computation was done on SDSC’s premier high-performance compute system, DataStar.