Turbocharge your job submissions!

December 2007 - October 2008 plot of Falkon across various systems (ANL/UC TG 316 processor cluster, SiCortex 5832 processor machine, IBM Blue Gene/P 4K and 160K processor machines). Over the past year, Falkon has seen wide deployment and usage across a variety of systems, from the TeraGrid, the SiCortex at Argonne National Laboratory, the IBM Blue Gene/P supercomputer at ALCF ANL, and the Sun Constellation supercomputer from the TeraGrid.

Applications that run thousands of jobs can cause headaches. Huge numbers of job submissions to a site often cause bottlenecks, make system administrators grumpy, and worse, bring down remote gateway nodes, rendering the resources useless and losing jobs in the process. Traditional techniques commonly used in the scientific community do not scale to today’s — let alone tomorrow’s — largest grids and supercomputers. But the new class of applications called Many Task Computing, discussed in the recent article “Many Task Computing: Bridging the performance-throughput gap” has spawned development of a new framework, called Falkon, that enables applications to scale up quite painlessly and use these large systems efficiently.

Minutes to milliseconds

Falkon (Fast And Light-weight tasK executiON) is designed to help restructure applications to reduce job wait time, network bandwidth and job submission overheads from minutes to milliseconds.  It leaves many of the higher overhead features such as accounting and persistency, for the local resource managers or the applications to handle. Falkon focuses on efficient handling of many independent tasks on large-scale distributed systems with many processors.

Falkon has demonstrated vast improvements in performance and scalability for a wide variety of tasks — tasks with execution times ranging from milliseconds to hours, compute- and data-intensive tasks, and tasks with varying arrival rates. The improvements extend across diverse applications from astronomy to medicine, economic modeling and beyond, and to scales of billions of tasks on hundreds of thousands of processors.

One researcher who adopted Falkon is Andrew Binkowski at the Midwest Center for Structural Genomics at Argonne National Laboratory. Binkowski and his team model three-dimensional protein structures in their basic research towards drug design. Since proteins with similar structures tend to behave in similar ways, the team compares the modeled structures to existing, known proteins in order to predict their functions -- a computationally intensive task.

 “As the Protein Data Bank (a repository of known proteins) expands almost exponentially, it becomes more difficult to coax desktop machines to do the types of analysis required,” says Binkowski. “We turned to Falkon as a way to utilize our existing software applications.”

What makes Falkon fly faster

The Falkon framework uses three novel techniques to enable rapid and efficient job execution and to improve application performance and scalability. Multi-level scheduling, in which resource allocation for a job is separated from job dispatch, enables on-the-fly resource allocation and minimizes the wait queue times. Secondly, Falkon’s distributed streamlined task dispatcher achieves from ten to a thousand times the dispatch rates that conventional centralized schedulers do. Third, Falkon’s data-aware scheduler can coordinate tasks and data so that the data transfer is minimized from shared or parallel file systems and across the network.

We can ask bigger questions

"Falkon has allowed us to ask bigger questions and perform experiments on a scale never before attempted — or even thought possible,” says Binkowski.  “This is the difference between comparing a newly determined protein structure to a family of related proteins versus comparing it to the entire protein universe.” 

The team has done all of this using existing software packages that were not designed for high-throughput computing or many-task computing, and used Falkon to coordinate and drive the execution of many loosely-coupled computations that are treated as “black boxes” without any application-specific code modifications.

“Whereas identifying similarities in protein binding pockets (for protein structure analysis) is characterized by millions of discrete jobs taking seconds to complete, docking and scoring a small-molecular compound (for drug discovery) can require several hours to converge on a solution.  In both cases, we are able to tailor our workflows to achieve the best possible scientific results and still get the throughput and efficiency we need to take advantage of the large computing resources we have available."

Ioan Raicu and Ian Foster

Source: iSGTW http://www.isgtw.org/