Investing in Open Source

Unprecedented Ranger supercomputer helps TACC experts and outside developers bring open source software into the Petascale Era: In supercomputing, hardware typically garners all the hype. A new processor, switch, or fabric makes waves, eliciting excitement in the high-performance computing community. But there exists layer upon layer of low-level software — assigning protocols, scheduling jobs, making connections — without which even the flashiest system would just be so much silicon and steel. “It’s true that software doesn’t get the same amount of attention as hardware,” said Karl W. Schulz, Associate Director of the Texas Advanced Computing Center (TACC). “But from our point of view, software is what makes or breaks the system. If you just plop down hardware and turn it on, nobody can do useful research on it. It’s the software that people interact with to make the hardware scream.” These middleware packages are the rarely mentioned, but crucially important glue that holds HPC together, and over the last year, TACC’s HPC researchers have embarked on an ambitious mission to bring this software into the Petascale Era. Working with academic and industry development teams around the world, they have made amazing progress on a number of fronts, enabling TACC’s preeminent system, Ranger, to maximize its potential and perform science across its full 62,976 cores. Open Source Commitment Whereas much of the software we’re familiar with — operating systems, word processors, graphics programs — are proprietary, with carefully protected codes and hefty price tags, a parallel stream of software has evolved that is developed communally, and has open, and freely available, source codes. Layer upon layer of low-level open source software, which assign protocols, schedule jobs, and make connections between nodes, allows Ranger to maximize its potential.
From its inception, TACC has embraced the open source software movement as a necessary component of advanced computing. “On a system like Ranger, everything about it is unprecedented. The hardware is new; the scale of what we’re trying to pull off is new. If you depend on purely commercial software, you’re at the mercy of the vendor,” Schulz explained. The Linux operating system —the foremost example of an open source system — is used by TACC’s Ranger and Lonestar supercomputers; and, in fact, the majority of the software on TACC’s HPC systems is open source as well. With codes freely available to all, open source software can be tailored and tuned to individual systems and improved by a wide range of individuals who bring their unique expertise to bear on the problem of crafting complex codes. “Because we move so quickly and we’re really pushing the boundary of HPC, having access to all the source code gives us an opportunity to integrate everything into our own software stack, and sometimes that’s half the battle,” Schulz said. Open source software, since it is non-proprietary, can evolve in a rapid and organic way, with far more frequent releases, allowing it to stay nimble and keep up with the brisk pace of hardware releases. And perhaps just as important, open source software is the de facto basis of most academic science and engineering applications. So, hewing to an open source approach with Ranger meant researchers could jump on the system and quickly ramp up their activities without learning any new programs. When Ranger first came online in Feb. 2008, however, much of the open source software needed to capitalize on its massive parallelism hadn’t yet been tested at full scale. “You can’t fix the software until you have access to a system to run on,” Schulz said. “So we made time available on Ranger to a number of different development groups for them to test, improve and optimize their codes.” Among the groups that have made significant strides because of this access are MVAPICH and OpenMPI — InfiniBand-aware MPI development stacks for HPC clusters that synchronize and coordinate thousands of parallel processes; the Open Fabrics Alliance, whose OFED subnet manager handles the routing protocol implementations throughout the expansive Infiniband fabric; the Sun Grid Engine (SGE) software, a batch scheduling system that manages large numbers of remote and distributed jobs; and the Lustre file system, which aggregates large storage pools into a high-performance parallel file system that can support thousands of clients. [see sidebar for a list of open source links] In the early stages of the development process, many of these leading packages were not quite ready to take advantage of Ranger's full scale. With eager scientists pushing hard to access the system, the HPC team at TACC, and their developer colleagues, had to scramble to make sure that Ranger could live up to the users’ demands. This required a massive team effort. Karl Schulz, TACC Associate Director for the High Performance Computing group, worked with software teams around the world to speed the development of Ranger-scale tools. [Photo courtesy of Josh Simons]
“As HPC systems are going to the Petaflop era with evolving technologies, close collaboration and coordination between systems, middleware and applications designers are needed to understand the strengths and limitations of the Petaflop systems and make design choices to deliver best performance and scalability to the applications,” said Dhabaleswar K. (DK) Panda, Professor of Computer Science at The Ohio State University and leader of the MVAPICH development team. Fritz Ferstl, Director of Sun Grid Engineering, concurred: “When it comes to HPC, it's all about scalability. And overall system scalability is determined by the quality of integration and balanced tuning, because the weakest component will set the upper bound. “TACC's Ranger is one of the biggest computer systems on earth and prior to Ranger there was no testbed for driving up scalability to such sizes,” he continued, “so the process of integrating all the components, including the open source software, at TACC was an unique opportunity to analyze and remove bottlenecks and carefully tune system configurations to enable the huge performance delivered by the Ranger system today.” Improving the software stack at the heart of Ranger's HPC environment benefits not only academics and supercomputing centers, but industrial vendors as well, according to Aviram Gutman, VP of Software Engineering at Mellanox Technologies. “Utilizing open-source software enables more companies to take advantage of high-performance computing for better scalability and productivity.” Low-level Improvements Lead to High-Impact Gains Less than a year into its tenure as the largest academic supercomputer in the world, researchers are now running simulations across the entirety of Ranger, enabling big science with a societal impact. “Without being able to advance the low-level software — the MPI-stacks, batch schedulers and so forth — users wouldn’t be able to run their science problem at the full scale of the system or explore problems that they couldn’t solve previously,” said Tommy Minyard, TACC Associate Director. Among the examples cited by Minyard of significant research enabled by advances in open source software, are the work of the Southern California Earthquake Center (SCEC), whose recent full system run on Ranger simulated dynamic earthquake ruptures they had never been able to model before; and the National Oceanographic and Aeronautic Association, who used two-thirds of Ranger to test next-generation hurricane forecasts. “Without the open-source underlying software, you can’t run these big problems,” he concluded. Even in cases where the capability previously existed, the considerable performance and speed improvements have a system-wide impact. “Since Ranger is such an important resource and there is such a large dollar amount associated with it, you want to maximize your return on investment for compute cycles,” Schulz explained. “The faster speeds mean increased turn-around times for the user, which is increased productivity in research, and also a better effective utilization of cycles.” Part of the return on investment comes down the line, as the knowledge and improvements gained through the development process trickle down to all the HPC systems that follow: to future National Science Foundation Track 2 systems, smaller university and industry clusters, and even to the eventual multi-core parallelism that is expected to become standard throughout the consumer market. “The advances made at TACC and similar labs will become a part of IT mainstream ‘tomorrow,’” Ferstl said. “So open source software provides the basis for commoditizing the high performance and high productivity software infrastructure of the future.” Indeed, the teamwork between TACC staff and vendor personnel, working to diagnose issues at scale, benefit both partners, as well the Lustre community at large, which include more than half of the top 50 supercomputers in the world. “TACC were key collaborators in verifying Lustre on its largest IB fabric to date and supporting Sun in isolating and resolving issues related to scalability,” said Peter Bojanic, Director of the Lustre Group at Sun Microsystems, Inc. Clearly, the researchers who do this work — at TACC, Sun, Ohio State, and elsewhere — are passionate about making high-performance computing as efficient, productive, and useful as possible. So even if very few will recognize the advances in open source software that they instigated, they reject the ‘unsung hero’ moniker. “It’s our job,” Schulz said. “It’s what a supercomputing center is supposed to do.” ************************************************************* For more information on software and tools available at TACC, visit the Software resource page. Aaron Dubrow Texas Advanced Computing Center Science and Technology Writer