Imagine popping a movie into the VCR or DVD player and watching a list of credits for two hours—no movie, no plot, no dialogue—just the cast. That’s the problem facing contemporary biology. The human genome project has provided researchers with a growing list of genes—basically a cast of thousands of characters, running life inside the cell. But the key to understanding life, both in sickness and in health, is the script that outlines how these cellular players interact, communicate, and cue each other. In healthy cells, genes and the proteins they produce interact harmoniously to carry out vital life functions. When signals are botched and genes miss their cues, the result is disease.
Richard Young at the Whitehead Institute for Biomedical Research, David Gifford at the Massachusetts Institute of Technology, and colleagues, have developed the first comprehensive script describing how the yeast genome produces life. The results will be published in Science on October 25, 2002. The researchers have identified how the leading players (called transcription factors) coordinate the action of all other genes in a living cell. "In the whole history of biology to date, we’ve come to understand gene regulation for only a few dozen genes. In the next few years, this new technology will allow us to unravel gene regulation across the entire genome of any organism, and thus the script underlying fundamental life processes," says Young. "The implications for human health could be immense," adds Young. "The whole is so much more than the sum of the parts. Knowing how the key players work together in a healthy cell gives us an entirely new way of looking at all diseases and new targeted approaches to developing therapeutics and vaccines." Until now unraveling the complex relationships between genes and proteins was a painstakingly long and tedious process, involving years of individual biochemical and molecular experiments. Advanced high-throughput biological and computing technologies provide a way to script these relationships in a global fashion and allow researchers to do in weeks what would have taken years to achieve. "A key hurdle we had to overcome was assembling snippets of conversation between genes into a coherent script. We developed new computational techniques, which allowed us to discover the script, visualize it over time, and to check it for accuracy," explains Gifford. "This study makes a significant contribution to understanding the fundamental organizing principles of life," said James Anderson, program director at the National Institute of General Medical Sciences, a component of the National Institutes of Health. "It also amply illustrates the synergy that can be generated by biologists and computational experts working together to tackle a systems-level research problem. We are fortunate indeed that such collaborations are becoming more common and that cultural and institutional barriers to such collaborations are beginning to give way." Transcription Factors Take the Lead Genomes contain two codes. One, discovered and elucidated over the last 50 years, is biology’s central dogma: genes, composed of DNA sequences, specify how proteins are made. But what determines the types and amounts of the various proteins that characterize a particular cell type, be it a skin cell or a blood cell? The genome also specifies the screenplay that coordinates the production of these proteins, and, in turn, how living cells develop and respond to changes in environment. Special proteins called transcription factors read a cell’s script and perform as lead actors to cue the production of necessary proteins (including the transcription factors themselves), by binding to specific genes. This process, called gene expression, is the basis for all cellular functions, and is highly complicated even in the simplest of cells. Young’s group used a technique called genome-wide location analysis, developed in their lab, to analyze the binding of most of the known transcription factors for baker’s yeast. The method is based on DNA microarray technology, which displays ordered segments of DNA and provides researchers with a compact format to quickly analyze how proteins interact with the DNA of an entire genome. Of the 141 transcription factors known for baker’s yeast, Young and his colleagues observed that 106 bound to about 2,300 locations in the genome, representing about 37 percent of the yeast’s known genes. The picture is complex, because a transcription factor can bind to more than one gene, and one gene can bind more than one transcription factor. The remaining 35 transcription factors were undetectable under current experimental conditions. Many of the other genes come into play under less typical environmental conditions (such as heat), interactions the group is now studying. In addition, some protein-gene binding events were eliminated by the group’s stringent statistical requirements. Whereas earlier work required approximately 300 researcher-years to find just some of the binding sites of only one transcription factor (Gal4), the present experiment needed only about a researcher-week for each of the yeast’s hundred-some regulatory proteins. "Without this increase in productivity, we just wouldn’t be able to create a comprehensive view of the how the genes in the cell are controlled," Young says. Missed Cues Lead to Disease The result is a vast network of the interactions between proteins and genes, a complex script that specifies the roles of all the players involved in a cell’s life. Each transcription factor was tied to a group of "supporting actors" it controlled—additional genes involved in cell growth, metabolism, or environmental response. Often times, the various transcription factors would cue each other to make more or less transcription factor. Understanding how biological processes are regulated on a whole-genome scale will help in developing targeted pharmaceutical approaches. For instance, identifying the control mechanism underlying how a cell knows when to divide is key to finding out what goes wrong in diseases such as cancer, where cells divide uncontrollably. "The pharmaceutical industry is based on therapeutics developed for correcting faulty protein products, which result from breakdowns in metabolic pathways. A new area of pharmaceutical industry will develop based on drugs targeting breakdowns in genome regulatory networks. Perhaps we can correct some problems even before a faulty protein is produced," predicts Young. The group is already working on doing the same analysis for the human genome, whose regulatory network consists of about 1,700 transcription factors. Although high quality sequence data currently exist for about a third of the human genome, about 95 percent of it should be available next spring. IMAGE CAPTION: Whitehead and MIT researchers have identified how the cell's leading actors (called transcription factors) coordinate the action of all other genes in a cell to produce life. This script is the key to understanding life, both in health and sickness, and outlines how these cellular players interact, comunicate, and cue each other.