ACADEMIA
Fleshing Out the Genome
Genomics, the study of all the genetic sequences in living organisms, has leaned heavily on the blueprint metaphor. A large part of the blueprint, unfortunately, has been unintelligible, with no good way to distinguish a bathroom from a boardroom, to link genomic features to cell function. A national consortium of scientists led by BIATECH, a Seattle-area non-profit research center, and Pacific Northwest National Laboratory, a Department of Energy research institution in Richland, Wash., now suggests a way to put this house in order. They offer a powerful new method that integrates experimental and computational analyses to ascribe function to genes that had been termed "hypothetical" – sequences that appear in the genome but whose biological purposes were previously unknown. The method not only portends a way to fill in the blanks in any organism's genome but also to compare the genomes of different organisms and their evolutionary relation to one another. The new tools and approaches offer the most-comprehensive-to-date "functional annotation," a way of assigning the mystery sequences biological function and ranking them based on their similarity to genes known to encode proteins. Proteins are the workhorses of the cell, playing a role in everything from energy transport and metabolism to cellular communication. This new ability to rank hypothetical sequences according to their likelihood to encode proteins "will be vital for any further experimentation and, eventually, for predicting biological function," said Eugene Kolker, president and director of BIATECH, an affiliate scientist at PNNL and lead author of a study in the Feb. 8 Proceedings of the National Academy of Sciences that applies the new annotation method to a strain of the metal-detoxifying bacterium Shewanella oneidensis. "In a lot of cases," said James K. Fredrickson, a co-author and PNNL chief scientist, "it was not known from the gene sequence if a protein was even expressed. Now that we have high confidence that many of these hypothetical genes are expressing proteins, we can look for what role these proteins play." Before this study, nearly 40 percent of the genetic sequences in Shewanella oneidensis—of key interest to DOE for its potential in nuclear and heavy metal waste remediation—were considered as hypothetical. This work identified 538 of these genes that expressed functional proteins and messenger RNA, accounting for a third of the hypothetical genes. They enlisted analytic software to scour public databases and applied expression data to improve gene annotation, identifying similarities to known proteins for 97 percent of these hypothetical proteins. All told, computational and experimental evidence provided functional information for 256 more genes, or 48 percent, but they could confidently assign exact biochemical functions for only 16 proteins, or 3 percent. Finally, they introduced a seven-category system for annotating genomic proteins, ranked according to a functional assignment's precision and confidence. Kolker said that "a big part of this was the proteomics" – a systematic screening and identification of proteins, in this case those which were expressed in the microbe when subjected to stress. The proteomic analyses were done by four teams led by Kolker; Carol S. Giometti, Argonne National Laboratory; John R. Yates III, The Scripps Research Institute; and Richard D. Smith, W.R. Wiley Environmental Molecular Sciences Laboratory, based at PNNL. BIATECH's analysis of this data included dealing with more than 2 million files. Fredrickson coordinates a consortium known as the Shewanella Federation. In addition to BIATECH, PNNL and ANL, the Federation also includes teams led by study co-authors James M. Tiedje, Michigan State University; Kenneth H. Nealson, University of Southern California; and Monica Riley, Marine Biology Laboratory. The Federation is supported by the Genomics: GTL Program of the DOE's Offices of Biological and Environmental Research and Advanced Scientific Computer Research. Other collaborators included the National Center for Biotechnology Information of the National Library of Medicine, National Institutes of Health, Oak Ridge National Laboratory and the Wadsworth Center. BIATECH is an independent nonprofit biomedical research center located in Bothell, Wash. Its mission is to discover and model the molecular mechanisms of biological processes using cutting edge high-throughput technologies and computational analyses that will both improve human health and the environment. Its research focuses on applying integrative interdisciplinary approaches to the study of model microorganisms, and advancing our knowledge of their cellular behavior. PNNL (www.pnl.gov ) is a DOE Office of Science laboratory that solves complex problems in energy, national security, the environment and life sciences by advancing the understanding of physics, chemistry, biology and computation. PNNL employs 3,900, has a $650 million annual budget and has been managed by Ohio-based Battelle since the lab's inception in 1965.