PSC Provides Direct Link from Galaxy to the XSEDE Backbone

Mountains of genomics data that had to work their way through a bottleneck of network connections now have a direct, high-speed link to the world’s most powerful data-processing resources — thanks to network engineering at the Three Rivers Optical Exchange (3ROX).

3ROX, a high-performance Internet hub operated and managed by the Pittsburgh Supercomputing Center (PSC), has put into place a high-bandwidth link from Galaxy, a data-intensive bioinformatics program at Penn State, to the network backbone of the National Science Foundation’s XSEDE (Extreme Science and Engineering Discovery Environment) program. This link opens the high-performance computing (HPC) resources of XSEDE to a research community that has not traditionally been a big user of HPC but, with emerging genomics technologies, will benefit greatly from using it.

This is the first dedicated link from a site that’s not an XSEDE “service provider” to the XSEDE network backbone, said Wendy Huntoon, PSC director of networking, and Penn State is a pilot site to do this because of Galaxy. “This link,” she added, “enables a much more efficient capability for Galaxy to get its work done.”

Galaxy, an open, web-based platform for biomedical research, allows biologists, who traditionally have not had the need to use HPC technologies in their research, to do complex data analyses in easy, web-based protocols. Galaxy has more than 10,000 users who run 4-5,000 analyses daily. Genomics data, in particular, has exploded over the last few years as a result of “next-generation sequencing” — which makes it possible to read DNA sequences at dramatically improved speeds compared to prior technologies.

Genomics researchers, however, need to assemble the sequences accurately into complete genomes and analyze them, and the skyrocketing quantities of data pose a research bottleneck, to which 3ROX and XSEDE now offer a solution. The new link to XSEDE, facilitated by 3ROX, is a 10-gigabit per second (10 billion bits per second) fiber-optic based link that greatly improves Galaxy’s connectivity to XSEDE sites.

“Next-generation sequencing is the biological version of the radio telescope,” says Anton Nekrutenko, associate professor of biochemistry and molecular biology at Penn State, who co-developed Galaxy. “These emerging technologies place huge demands on data analysis and storage.”

“The network connection to XSEDE through PSC is a huge breakthrough,” adds Nekrutenko. “It provides us with the ability to run up to 150,000 jobs per month, and we expect to quadruple that as this link gets fully up and running. It allows biologists to take advantage of HPC resources in ways they otherwise could not, not only the computing, but the storage resources at XSEDE sites. It democratizes research by making XSEDE useful for a scientific community that traditionally has not been a heavy user of high-performance computing.”

A four-year grant of $1.5-million to 3ROX in 2010 through NSF’s Academic Research Infrastructure (ARI) program provided support for the new high-bandwidth link. “This ARI grant is intended to advance ‘meritorious scientific research,’” said Huntoon, “and we were able to provide the equipment from this funding.”

Through XSEDE’s Extended Collaborative Support Service (ECSS), XSEDE staff are working with Galaxy scientists to develop capability that will allow biologists to transparently use XSEDE data analysis and storage resources as needed. Led by ECSS “Science Gateways” manager Suresh Marru (Indiana University), ECSS consultants Terri Schwartz (San Diego Supercomputer Center) and Josephine Palencia (PSC) are collaborating with Galaxy staff to incorporate distributed data analysis and management capabilities into future versions of Galaxy software.

More about 3ROX: http://www.psc.edu/networking