CTC CBSU RepeatFinder

A new e-Science tool for bioinformatics lets scientists sift through genomes like miners sifting through gravel with fine mesh screens as they pan for gold. Researchers at the Boyce Thompson Institute for Plant Research (BTI), a not-for-profit research organization, and computational scientists at the Cornell Theory Center’s Computational Biology Service Unit (CBSU) have announced the launch of RepeatFinder, an innovative and unique resource for computational genomicists. RepeatFinder, developed jointly with David Stern and Jude Maul of BTI and Jarek Pillardy from CBSU, is designed to help scientists search the non-coding regions of a genome for small clues to the mysteries of evolution. For most of his career, David Stern, a biologist and vice president for research at BTI, has been studying simple plants, known as algae, as model systems. He examines how plant genes respond to environmental signals. Stern’s recent focus is on testing the genetic response of the green alga Chlamydomonas reinhardti to various environmental stresses, work that is reported in the November 2002 issue of The Plant Cell, a journal of the American Society of Plant Biologists. Maul, familiar with the publicly available software BLAST used by academic and government genomics research projects worldwide, began to characterize the kind of sequences that lay between the well-known genes of Chlamydomonas chloroplast DNA. What they found was shocking. As described in the paper, “The Chlamydomonas reinhardtii Plastid Chromosome: Islands of Genes in a Sea of Repeats,” also in the current issue of The Plant Cell, they found an explosive proliferation of small repeated sequences – literally thousands of them packed into an otherwise compact genome. These short sequences have been named Small Dispersed Repeats (SDRs). It immediately became obvious to the biologists that sorting out the relatedness, origin and functions of SDRs would require tools unlike any that were available, a new suite of surveyor’s tools. This led Stern and Maul to the CBSU. Stern works in a wet lab doing measurements of gene activity and investigating the complex biochemical ballets that take place as cells encounter changing conditions. Although he has appreciated advances being made in computer science and recognized the potential of the new field of bioinformatics, Stern saw no practical application for it in his personal research—at least until recently. Stern’s interest became more than casual when he joined the effort to sequence the chloroplast genome of Chlamydomonas, data he needed for his research. As part of this effort, he recruited a young research assistant, Jude Maul, to join his lab because Maul had experience working with plant material, coupled with his interest and ability in computing. As biologists, Stern’s team is more than a group of surveyors defining (sequencing) the last regions of a genome, they want to explore the landscape. Stern and Maul made a worthwhile contribution to the genome survey, which was released to the public on November 2, 2002, finding a few new genes in the process. However, they know full well that much of the information stored in any genome lies between the known functional genes. The contents of these so called intergenic regions have been sometimes wrongly called “junk.” This is partly because to date, few researchers have had tools to examine them carefully. Maul represents a new breed of computational biologist, the intellectual link between science and technology. In this case he linked Stern’s lab and CTC’s CBSU to collaborate on the development of new tools. Stern describes the impact of this relationship in the following way: “Today’s biologists are relying more than ever on sophisticated data analysis programs, but developing and modifying such programs is beyond most of their reach. On the other hand, computational scientists rarely have the biology background to understand the nature of the computing problems that biologists are encountering. The CBSU offered us a chance to communicate directly with computational scientists seeking biological problems to address.” “Stern and Maul came to us with a basic understanding of BLAST, the fundamental tool for comparing and searching gene sequences. They also had a specific research question. It was up to us to build the right system for their needs” says Pillardy. “As we began to explore the project, we all saw that this tool had potential for a broad group of users and we are excited to offer it to the entire Cornell community.” Pillardy provided the perspective of a computational scientist; he was able to devise fast and efficient methods, customized versions of the BLAST algorithms, to search for and compare among the short sequences that interested the group. Pillardy worked with Maul, who had determined the scope of the modifications, to produce an efficient system, first speeding up the process at least 100 times for the biologists’ workstations. However, this was just the first step. Cornell scientists and their colleagues around the world are now able to search for these short and fascinating traces of evolution across other genomes using Web services that give them realtime access to high-performance cluster computers from their labs. From their comparison among several algal chloroplast DNA sequences, the researchers were surprised to find that the short dispersed repeats (SDRs) had invaded nearly all the intergenic regions in Chlamydomonas chloroplast DNA, but are not commonly found in other algae. Is this the tip of an iceberg? The project continues. “For us, as biologists, the big question is whether the SDRs serve a useful biological function or whether they are in fact, evolutionary debris,” says Stern. “Our working hypothesis is that the SDRs facilitate reshuffling of the chloroplast genome, and in doing so allow it to arrive at a more advantageous gene arrangement that would give it a functional advantage. I could go on, but this is the bottom line.” Stern is testing this theory with collaborators at Penn State, coauthors on the above listed paper. This is the maiden voyage across the “sea of repeats” for RepeatFinder. “We are excited at the potential of this and future versions to impact genome analysis not only in plants, but in other organisms” says Stern. “So-called ‘junk’ DNA could become a thing of the past, as we begin to better define its composition and investigate its function.”