TeraGrid Software Project Wins CLADE 2006 Best Paper

A paper that describes a system for aggregating processors on demand from across the distributed resources of the National Science Foundation TeraGrid has won the best paper award for the 2006 CLADE (Challenges of Large Applications in Distributed Environments) workshop to be held June 19 in Paris, France. The paper, “Creating Personal Adaptive Clusters for Managing Scientific Jobs in a Distributed Computing Environment,” describes a virtual environment built on top of an existing middleware tool called GridShell. The combination, including the pre-existing middleware, will be renamed MyCluster. It is already in production use on the TeraGrid, where it has handled about 100,000 jobs and 900 teraflops of scientific computation. “MyCluster solves a problem,” says lead author Edward Walker of the Texas Advanced Computing Center (TACC), “in how to efficiently support multiple users in the submission and management of many thousands of simultaneous serial jobs across a heterogeneous mix of large compute clusters connected in a distributed environment.” Walker’s co-authors are Jeff Gardner of Pittsburgh Supercomputing Center (PSC), Vladimir Litvin of the California Institute of Technology, and Evan Turner, also of TACC. MyCluster responds to a TeraGrid user survey finding that many users have a need to submit, manage and monitor many hundreds or even thousands of jobs simultaneously. Two large scientific projects in particular provided initial motivation: The Compact Muon Solenoid (CMS) particle physics project (in which Litvin participates) and the National Virtual Observatory (NVO) astronomy project (in which Gardner participates). Both these projects require extensive analysis of enormous amounts of data for which the computational demand involves between 50,000 and 500,000 simultaneous jobs. The NSF TeraGrid comprises a heterogeneous mix of compute clusters and other systems of varied architectures — totaling at present over 100 teraflops of capability — running different operating systems at eight resource-provider sites. The TeraGrid therefore realizes a benefit from a middleware tool — provided by MyCluster — that offers a seamless environment for users to harness aggregate capabilities across sites transparently. Walker will present the paper at the last session of CLADE on June 19. CLADE is held in conjunction with the 15th International Symposium on High Performance Distributed Computing (HPDC-15), also in Paris. The TeraGrid, sponsored by the National Science Foundation Office of Cyberinfrastructure, is a partnership of people and comprehensive resources to enable discovery in U.S. science and engineering research. Through high-performance network connections, the TeraGrid integrates a distributed set of very-high capability computational, data management and visualization resources to make U.S. research more productive. With Science Gateway collaborations and education and mentoring programs, the TeraGrid also connects and broadens scientific communities. For more information, see its Web site.