Cyberinfrastructure for Plant Biologists

The iPlant Collaborative releases new web-based tools to help plant scientists integrate massive datasets

Developed by iPlant staff at Cold Spring Harbor Laboratory’s Dolan DNA Learning Center (DNALC), DNA Subway presents complex bioinformatics and visualization tools – predominantly open-source software – in an intuitive and appealing interface.

Plants are the source of our food supply and breathable air.  They are vital for carbon capture, bio-fuel production, and new pharmaceuticals. In fact, so much of society’s current and future health and well-being are tied to plant science that over the past decade, the National Science Foundation (NSF) has supported projects in the field with more than a billion dollars in funding.

What is still needed, however, is a way to integrate the massive, disparate datasets, algorithms, and tools, leveraging the NSF’s billion-dollar investment to create a comprehensive network of knowledge that will advance humanity.

In 2008, the NSF initiated the “iPlant Collaborative,” a $50 million, five-year project to create the cyber- (or computer) infrastructure needed to tackle "grand challenge" questions in plant biology.

“We are facing a lot of challenges in food production and food security going forward and iPlant puts together the computational framework by which researchers will address these grand challenges,” said Dan Stanzione, co-director of the iPlant Collaborative and deputy director of the Texas Advanced Computing Center. “iPlant is the first attempt at this scale to build a cyberinfrastructure that fills the gap between the nation’s physical cyberinfrastructure — the supercomputers and networks — and the kinds of computational analysis that plant scientists do everyday.”

The iPlant tree visualization application allows the user to interactively explore extremely large phylogenetic trees.  This image is showing the National Center for Biotechnology Information (NCBI) taxonomy tree, containing approximately 260,000 species, zoomed in on the Lamiales order.

iPlant will initially tackle two crucial problems in plant biology. The first, the “iPlant Tree of Life” (iPToL), attempts to create a phylogenetic tree representing the diversity and relationships between the world’s green plants. The second problem involves understanding how a plant’s DNA combines with environmental conditions to give that plant its unique traits.

“Why do some plants flower faster? What is the relationship between their DNA and how they behave?” Stanzione asked. “We want to take genetic information about plants and understand how that maps to expressed characteristics.”

This March, iPlant announced the beta release of the first set of computational environments and software frameworks designed to help plant scientists make discoveries faster. The “Discovery Environment,” “DNA subway” prototype, and “Tree of Life” visualization tool provide the first glimpse into the types of infrastructure that iPlant will integrate and distribute.

Web-based and easy-to-use, these tools allow scientists to perform remote computation and analysis on supercomputers. By drawing on the resources and expertise of the nation’s supercomputing centers and their staff, plant scientists, in collaboration with computer scientists and information scientists, will move closer to addressing critical questions in plant biology.

The “Discovery Environment” (DE) is at the heart of iPlant. Modeled after Web 2.0 applications, like Wikipedia and Flickr, the DE allows community members to build content in a democratic way, making connections between different types of data and integrating information into a single user interface. The discovery environment incorporates existing bioinformatics tools and runs them seamlessly on remote high-performance computing resources. It also provides secure data management and editing environments for robust production calculations.

[A beta version of the Discovery Environment provides a working demonstration of its architecture and features, including the ability to analyze tree and trait data. iPlant is accepting account requests for access to this release, which is intended to give the community a sense of the direction that development is taking.]

The Discovery Environment provides a modern, common web interface and platform to expose the computing, data, and application resources made available to the community.  The Discovery Environment will provide access not only to tools built by the collaborative, but to many community-contributed tools as well.

The Tree of Life visualization tool is just one of many applications that will ultimately be built into the Discovery Environments, some by the iPlant Collaborative team, but many more from the community. The TreeVis tool revolutionizes the way phylogenetic trees are represented by introducing interactivity, scalability, and new ways of visualizing the connections between plants. The tool allows researchers to explore the relationships between distant or closely related species, to trace the historical sequence of genome reorganizations, or to examine patterns of adaptation in terms of geographic variance, climatic change, or co-evolution.

The “DNA Subway,” which complements the Discovery Environments, is a learning environment where students, educators, and researchers can access the large-scale datasets and high-powered informatics tools that drive modern biology. Using the subway metaphor, the application leads users through several “stations” where they are able to annotate genes and perform genome comparisons. The tool is currently available to all interested users.

Together, these new frameworks leverage existing and emerging applications, as well as the network of computational resources, to help scientists analyze, visualize, and make meaningful discoveries about plants and their DNA, far faster than ever before.

Said Stanzione, “By removing the grunt work of science — the need to manually convert data sets or to make a tool fit the data set that you’re working on — this project enables us to get to solutions more quickly, speeding scientific progress.”