Planting Seeds for a Fertile Future

The iPlant Collaborative links plant biologists through emerging cyberinfrastructure

Story Highlights:

  • In January 2008, the NSF initiated the “iPlant Collaborative,” a $50 million, 5-year project to tackle "grand challenge" problems in plant biology.

  • Over the course of the last year, plant scientists chose the two most pressing research topics to explore in-depth: the creation of a phylogenetic tree of life and an exploration of how plant genes express a plant’s characteristics and functions.

  • TACC has become an active collaborator in iPlant, applying the computational resources, as well as the talent of the center’s staff, to help discover answers to the world’s food problem.

 

September 2009 saw the passing of Norman E. Borlaug, pre-eminent plant biologist and architect of the “Green Revolution” which led to the eradication of mass starvation throughout much of the world. By combining high-yield hybrid seeds with modern fertilizers, the green revolution enabled farmers to produce 10 times more food on the same amount of land.

The benefits from the green revolution are dwindling, however. To continue to feed the world’s population, science needs a new breakthrough— one that will likely be enabled by computers.

For this very reason, in January 2008, the National Science Foundation (NSF) initiated a $50 million, 5-year project — the “iPlant Collaborative” — to create the cyberinfrastructure needed to tackle "grand challenge" questions in plant biology.

Unlike any other single research entity in the U.S., iPlant will provide the capacity to draw upon resources and talent in remote locations to enable plant scientists, computer scientists, and information scientists from around the world to collaboratively address questions of global importance.

"The iPlant Collaborative will harness the best scholars and research in plant biology to tackle some of the profound issues of our day and for our future,” said NSF Director Arden L. Bement when the project was first announced.

Plants are vital to our survival on Earth in ways both obvious and subtle.

Screenshot from the iPlant Collaborative website.

“To continue to be able to feed the world is a crop production problem,” said Dan Stanzione, deputy director of the Texas Advanced Computing Center, and iPlant’s co-director. “Plants are the source of all of our breathable atmosphere. So if we want air that we can continue to breathe, that’s a plant problem, too.”

There are other implications as well. Plants are important for carbon capture, bio-fuels, and new pharmaceuticals. Much of society’s health and well-being is tied up in plant science and our ability to improve the output of crops.

According to Stanzione, advanced computing can help. “iPlant is the first attempt at this scale to build a cyberinfrastructure that fills the gap between building a supercomputer and what scientists do in their labs.”

This has meant identifying the technical and structural obstacles that prevent researchers from effectively using available high-performance computing systems and searching for solutions.

There is another crucial reason for the NSF’s large investment in iPlant. Over the past decade, NSF has supported individual projects in the plant sciences with more than a billion dollars in funding. What is needed is a way to integrate the disparate data that was being generated and leverage the billion-dollar investment to create a comprehensive network of knowledge.

The iPlant project differs from most scientific research in that it did not identify a pre-ordained set of scientific questions to answer. “We took a more radical approach: to simply propose an organization, a team, and a process and not really propose to solve any specific problems,” said Stanzione. The idea was to let the community identify and prioritize grand challenges, and then iPlant would develop a cyberinfrastructure that could capably address those grand challenges.

“iPlant is the first attempt at this scale to build a cyber-infrastructure that fills the gap between building a supercomputer and what scientists do in their labs.”

Dan Stanzione, iPlant co-director and TACC deputy director

The community needed training, specialized software, and middleware tools, but first they needed to come to a consensus on what “grand challenge” questions they ought to address. This decision would drive the future effort.

At iPlant’s May 2008 kickoff meeting, leading plant scientists converged to begin choosing the most pressing questions in the field. The meeting led to a series of workshops where major challenges were discussed, from how climate change is affecting crops to how plant form arises from and feeds back to cellular, biochemical, and informational processes. This, in turn, led to six Grand Challenge proposals.

By early 2009, iPlant had chosen two problems to tackle. The first is the creation of a phylogenetic tree of life containing 50,000 species representing the diversity and evolutionary relationships between the world’s green plants. The second involves understanding how plant genes express a plant’s characteristics.

The first project is crucial because it will help identify points in a species’ evolution where a new trait, like drought-tolerance, arose to provide ecological advantages to that species. The second, equally important project tries to mine and map the complex relationships between a plant’s genes and its physical characteristics, which is typically the result of a combination of genes under a variety of conditions.

“We want to take DNA information about plants and understand how that maps to expressed characteristics,” Stanzione explained. “Why do some plants flower faster? What is the relationship between their DNA and why they behave a certain way?”

After the grand challenge questions and teams were identified, the teams formed working groups to identify the use requirements for specific tools that iPlant will build to make computational plant science a more user-friendly experience in the future.

“If we want to build the tree of life, what are the specific problems, tools and data sets that are required?” Stanzione asked. “We’re in the requirements-gathering and software-building phase to turn these resources into real tools that use the cyberinfrastructure that we provide at TACC.”

iPlant will provide users with several unique online discovery environments. There, simple, step-by-step processes will allow researchers to input, analyze, visualize, and make meaningful observations from massive data sets, whether they are genetic sequences or ecological responses. Demonstration versions of these discovery environments will be available in the coming months.

iPlant developers touring the TACC visualization lab at
The University of Texas at Austin. [Photo by Freddy Rojas,
TACC.]

Another aspect of iPlant that is key to NFS’s support is its Education, Outreach and Training efforts. In working to prepare the next generation of scientists to use computational thinking and cyberinfrastructure tools, iPlant’s goal is to facilitate significant advances in plant science research for generations to come.

With the arrival of Stanzione, TACC has become an active collaborator in iPlant, applying the talent of the center’s staff in data analysis and storage, rapid genetic sequencing, and biological visualization. “We’re drawing from our broad expertise in various aspects of cyberinfrastructure and leveraging that across the project,” he said. The University of Arizona, Cold Spring Harbor Laboratory in New York, University of North Carolina-Wilmington, and Purdue University are the other primary partners on the project.

In a June, 2009 Nature article, iPlant's original principal investigator, Richard Jorgensen, described the project as a “shotgun marriage between biologists and computer scientists.” It is from the increasing overlap and synergy between these two disciplines that researchers believe the great breakthroughs in plant biology will come.