BIOLOGY
Greene builds supercomputer to identify gene interactions in human tissues
- Written by: Tyler O'Neal, Staff Editor
- Category: BIOLOGY
Dartmouth researchers and their collaborators have deveoped a supercomputer to crunch big biomedical data in order to recognize how genes work together in human tosses.
The findings, which shed light on genetic interactions that underlie human diseases, appear in the journal Nature Genetics. A PDF of the study is available on request.
Sequencing of the human genome was completed in 2003, but our genomes make us human through the dynamic interactions of genes and their products in genetic pathways. Scientists studying interactions between genes often assume that all human tissues are the same -- essentially human soup. Today, researchers know proteins work together in biological systems and that in humans they work together in different ways in different tissues.
In a prior paper, Casey Greene, an assistant professor of genetics at the Geisel School of Medicine at Dartmouth, and his collaborators developed a computer system that "virtually dissects" tissues to identify which genes are present. The computer uses data from gene-activity measurements in biopsies to separate cells mathematically and identify genes that are turned on in a specific cell type. Using a large database of such gene-activity measurements to track genetic lineage allows scientists to refine their analysis through thousands of measurements. The method has proven far faster and more effective than current techniques. Knowing which genes are present is important, but it's only a part of the story.
"In our new paper, we discover how genes work together in distinct tissues and cell lineages by training a computer to extract this information from big data," says Greene, a co-lead author. "It's impossible to directly measure most of these cell lineages to identify these gene-gene interactions, but it's interesting that while we can't do this directly, a computer can."
In the new study, researchers developed an algorithm that allows them to train a computer to recognize how much each of more than 1,000 different datasets reflected interactions between genes in more than 140 different human tissues. The supercomputer builds a mathematical formula that identifies links between similar patterns and what distinguishes them from other, unrelated patterns. They then used that information to build gene-gene networks for each of these tissues. They found that these networks provided much more information about human diseases than "human soup" networks. They then did follow-up experiments to test predictions related to many different diseases to show the power of their approach.
Greene's research aims to develop new data mining techniques and tools to improve our understanding of living organisms. That work puts him at the forefront of investigating genome sequence data that hold the secrets to what makes each of us different and what predisposes us to disease. Currently, more than 1.5 million genome-wide measurements of how genes are expressed are available for researchers to download and study, but with so much data new algorithms are now needed to construct a data-driven portrait of biology. Traditional methods of data analysis are powerful but require researchers to frame precisely the question that they wish to answer. Greene's lab is developing new deep learning algorithms that can automatically put these genomic data into context, answering important questions about biological processes and even identifying new questions to ask. Additionally, Greene wants to make these deep learning data discovery methods available to biologists around the world by creating robust and easy-to-use web-based tools that they can use in their own research efforts.