SYSTEMS
Supercomputing enables identification of microbe DNA in soil
The traditional method for studying a microbe is to cultivate it in the lab and examine its biology in detail. However, lab cultivation is possible for only a small fraction of microbe species. Scientists have thus turned to metagenomics – the computation-reliant study of DNA extracted from environmental samples rather than from cultivated organisms.
In metagenomics, scientists grind up samples containing many different organisms and extract all the DNA they can, not knowing which pieces of DNA came from which organisms. A one-gram soil sample can contain up to several million species of microbes all mixed together. The scientists sequence small, random fragments of the DNA to identify species and determine how they function, explained Jonathan Eisen, University of California, Davis researcher and head of the Genomic Encyclopedia of Bacteria and Archaea project of the Department of Energy’s Joint Genome Institute, which aims to catalogue genomic data for all major branches of microorganisms.
“Metagenomics is very much pushed by the available sequencing technology, and it totally depends on the algorithms and computing to make sense of the data,” said Folker Meyer who runs Argonne National Laboratory’s MetaGenome Rapid Annotation using Subsystem Technology (MG-RAST) server, the primary data repository and analysis resource for the metagenomics community.
At Argonne National Laboratory computational biologists Folker Meyer and Elizabeth Glass view charts of metagenomic data analysed using grid computing resources. Image courtesy of ANL. |
MG-RAST, which came online in 2008, is a free, fully automated online service for annotating the metagenome (the set of fragments of sequenced DNA) of an environmental sample. With over 1,500 users, currently MG-RAST houses more than 2,600 private and about 300 public metagenome datasets.
Researchers upload their sample’s metagenome, and MG-RAST uses a variety of computing resources – Argonne’s 800-core cluster, TeraGrid and cloud computing – to compare the DNA fragments to those from every other sample in the system as well as to gene sequences in several other publicly-available databases. Via its relationship with the nonprofit organization Fellowship for the Interpretation of Genomes on its “Project to Annotate 1,000 Genomes,” the MG-RAST team also has access to a large basis of smaller curated genome data sets. The software uses similarity to known genes to guide the reconstruction of the various species in the sample and to provide information on their functions.
The databases do not contain the genome of every species of microbe, often making it difficult to classify the organisms in a sample. “It is estimated there are at least 200 major groups of bacteria, and we (the public sector) only have genome data for about 10 of them,” said Eisen.
“Although there is still much work ahead, metagenomics provides a powerful new tool to help researchers better understand microbes they cannot grow in the lab,” Meyer said. “Metagenomics is more or less unleashing our ability to study the genomics of microbes from all sorts of environments across the planet.”
—Amelia Williamson, for iSGTW