Virtual Clustering

TACC-developed MyCluster software allows researchers to aggregate resources across the TeraGrid and into the clouds: The goal of the TeraGrid — the NSF-sponsored cyberinfrastructure of people, computing resources and services — is to enable scientists to leverage the most powerful supercomputers to accomplish their research goals, regardless of where they are located. However, to achieve its full potential as a flexible, high-powered innovation accelerator, the TeraGrid needed a way for researchers to harness all the different systems on the Grid together, maximizing the efficiency and usefulness of the network. “In the academic grid computing space, sharing is key,” said Edward Walker, researcher in the Distributed and Grid Computing group at the Texas Advanced Computing Center (TACC). “You’re trying to allow a user to aggregate and leverage federated resources from across institutional boundaries to do large-scale science.” Screenshot showing a MyCluster virtual login session with a dynamic interface showing how and where a user's project are running. Click for larger version.
This wasn’t a trivial task a few years ago, when Jeffrey P. Gardner, senior research scientist in high-performance computing at the University of Washington, was looking to run his research team’s large-scale cosmic simulations on the TeraGrid. “We needed to run a humongous number of independent jobs,” Gardner recalled. “One of our users had several million computations. The advantage of having a large number of independent tasks is that you can farm them out anywhere. But the TeraGrid had no real way to do that.” Fortunately, it was precisely at that time, in 2006, that Walker introduced the first version of the MyCluster software tool (then called GridShell). “MyCluster was instantly useful,” Gardner said, “and allowed researchers to spend a lot more time on the science and the computing rather than the management of the project.” MyCluster creates a virtual cluster, a group of linked computers that work together closely and can be managed remotely. Interacting with the user’s preferred job management system (with which most scientists are familiar), the tool sets up job proxies on whatever system can accomplish the computing tasks most quickly. When these jobs run, the provisioned computing resources appear on the user’s personal computer as a virtual cluster that dynamically shrinks or expands over time. “All the user has to worry about is submitting his own jobs into his own personal cluster,” Walker said. “The user doesn’t have to worry about the infrastructure or deal with the heterogeneous usage policies at the different supercomputing sites.” As Gardner has experienced, the ability to coordinate resources at multiple supercomputing centers and at departmental systems, from a single virtual cluster, enables scientists to accomplish projects that otherwise could not be attempted. “What I really want to be able to do is just submit my job to ‘The TeraGrid,’ and have the scheduler send some jobs to TACC, and some to Pittsburgh Supercomputing Center, to get my computation done as quickly as possible,” Gardner said. “And MyCluster accomplishes this meta-scheduling solution.” Among its notable achievements, MyCluster helped Michael Deem (Rice University) run a million jobs relating to a zeolite crystal structure simulation on the TeraGrid in just six months — no small feat (see paper). The tool also helped the National Virtual Observatory create a way for astronomical researchers to find, retrieve and analyze data from ground- and space-based telescopes across the globe (see iSGTW article). Edward Walker, researcher in the Distributed and Grid Computing group at the Texas Advanced Computing Center (TACC).
“From an ease of use standpoint, MyCluster simplifies your job tremendously because you let the tool make all the scheduling decisions for you,” Gardner said. Importantly, it also allows users with large numbers of serial, single-processor jobs to use TeraGrid machines, which, because of policies at some centers, is not otherwise possible. MyCluster 2.0, currently in development and slated to arrive in Spring 2009, will expand on the original feature set, adding a dynamic, virtual global file system and compatibility with cloud computing solutions like Amazon EC2. It will also provide an even more transparent and unobtrusive infrastructure. “It’s been said that the most profound software systems are those that disappear,” Walker stated. “The interface is the job management system that the user picks, and as much as possible, we want the user to not interact with the software. MyCluster is this transparent tool that goes out and does something for the user.” MyCluster 2.0 will aim for invisibility with extended features, expanding the capabilities of virtual clustering throughout the TeraGrid and up into the clouds. ************************************************************* This work is based in part on work supported by the National Science Foundation under grant #0721931 and #0503697. To learn more, explore the MyCluster User Guide. Aaron Dubrow Texas Advanced Computing Center Science and Technology Writer