Sabeti lab built machine learning model helps to design better viral diagnostics

Researchers have developed an automated method that predicts the effectiveness of viral diagnostic tests and designs optimized ones.

The surge of the Omicron variant has highlighted an urgent need for diagnostic tests that accurately detect viruses, even when they mutate. Now, scientists at the Broad Institute of MIT and Harvard have developed the first fully automated system that uses machine learning to design viral diagnostics. Pardis Sabeti

The method, called ADAPT, helps scientists create highly sensitive diagnostics (can detect low levels of virus) and specific, meaning that they detect only the virus of interest and not others. The researchers used their approach to create diagnostics for each of the nearly 2,000 viruses known to infect vertebrates, including SARS-CoV-2. 

Designing a viral diagnostic involves carefully selecting the best places in a virus’s DNA or RNA for the test to target. Researchers choose those sequences mostly by hand, guided by some rules, but there is also a lot of trial and error. ADAPT, which uses trained algorithms to predict the best sequences for a diagnostic, promises to help scientists rapidly design tests that are more effective for a large number of different viruses and can be quickly modified and scaled as viruses evolve.

“ADAPT is really about developing countermeasures that target the virus that's circulating right now and being prepared to move with the virus as it changes,” said Pardis Sabeti, senior author of the study and an institute member at the Broad. Sabeti is also a Howard Hughes Medical Institute investigator, a professor at the Center for Systems Biology and the Department of Organismic and Evolutionary Biology at Harvard University, and a professor in the Department of Immunology and Infectious Disease at the Harvard T. H. Chan School of Public Health.

“As we’ve watched SARS-CoV-2 adapt in real-time, we’ve learned just how much we need to change with it and other viruses.”

BUILDING A BETTER MODEL

In 2018, a team led by then-graduate student Hayden Metsky in the Sabeti lab began developing a machine learning model to analyze the wealth of viral sequence data being generated by labs around the world.

“Current techniques in machine learning and optimization are really well suited to making sense of all this data,” Metsky said. “Our goal was to better leverage the diverse sequencing data out there to design more effective diagnostics.”

To develop ADAPT, the team first focused their efforts on CRISPR-based tests, which use programmable “guide RNAs” and CRISPR enzymes that find specific viral sequences and generate a fluorescent signal.

The scientists then designed a large number of these tests, each to look for a different target from viral genomes. They used a recently developed Broad technology called CARMEN to measure the effectiveness of thousands of combinations of guide RNAs and viral targets simultaneously. 

Using this large trove of test efficiency data, the researchers then trained a machine learning model to predict which guide RNAs would generate strong signals in a diagnostic test across different viral strains and variants. Metsky says this means that a diagnostic will be likely to detect different lineages — known and even novel ones — as a virus evolves. ADAPT also automatically incorporates new viral genomes from public databases into the design process so that it stays up-to-date as new variants emerge.

“At the core of building good diagnostics is knowing what to target and how to target it,” Sabeti said. “We spend a lot of time building technologies to do that, but we’ve shown that with thoughtful algorithmic work, we can get these methods to work much, much better.”

DETECTING SARS-COV-2 AND BEYOND

Early in 2020, when COVID-19 was beginning its march around the world, Sabeti, and Metsky, by then a postdoctoral fellow, quickly refocused their efforts. 

“When we concentrated on COVID in mid-January 2020, it was remarkable how quickly the global community was generating genomic data on the virus, with 20 genomes at the time and that number growing exponentially,” Metsky said. “We had been building machine learning models and algorithms that accounted for viral variation based on genomic data, and wanted to apply our work to rapidly generate highly sensitive assays for SARS-CoV-2 that maintained that sensitivity as the virus evolves.”

Metsky and the team used ADAPT to create diagnostics for SARS-CoV-2 and 66 other viruses that are genetically related or cause similar symptoms. When they tested four of ADAPT’s designs in the lab, they found that the tests were more sensitive than diagnostics developed according to more traditional rules.

Though the team first used their approach to create CRISPR-based diagnostics, they say ADAPT can be applied to other sequence-based tests as well, and are already adapting it for qPCR, the most widely used viral diagnostic tool. 

Metsky and Broad software engineer Priya Pillai also built a website where researchers can find and visualize diagnostics the team designed for known viruses, or run ADAPT on new data to develop their own. As ADAPT and its user base grow, the team will continue to improve their website to make it easy to use for labs with little in-house supercomputing power or bioinformatics expertise.

Ultimately, the team says other researchers could use ADAPT to create new, highly effective diagnostics for known or emerging viruses. In the meantime, Metsky says tests that distinguish between SARS-CoV-2 and other respiratory viruses that cause similar symptoms will continue to be critical, and ADAPT could be useful in developing those tests. “If COVID becomes endemic, we’ll need to do a better job identifying the wide swath of respiratory viruses that are circulating, including their vast and ever-changing variation,” he said.

Changes in air pollution linked with dry spells in Asia, summer heatwaves in Europe

Air pollution increases in South East Asia, combined with pollution cuts in Europe, may have had an important influence on European and Asian weather patterns in recent decades, new research has found.

Analysis of weather records and climate models by scientists at the University of Reading revealed that changes in air pollution levels in the two regions were likely the primary driving force behind changing atmospheric conditions that favored prolonged summer extremes in Europe, as well as causing dry spells in Central Asia.

New research shows that the air pollution changes during 1979-2019 reduced the temperature gradient between the two regions, significantly weakening the jet stream over Asia.

These high-altitude winds have a strong influence on atmospheric circulation in the Northern Hemisphere, and shape weather across Europe and other mid-latitude areas.

Dr. Buwen Dong, an NCAS scientist at the University of Reading, said: “Our findings suggest changes to air pollution had a greater influence on Northern Hemisphere summer weather than we thought.

“The research counters previous suggestions that the weakening of the summer jet stream was the result of rapid warming in the Arctic due to greenhouse gas emissions. It highlights another significant role human activity plays in driving extreme weather over vast regions.”

Air pollution is known to have a direct impact on surface temperatures since the pollution particles prevent heat from the sun from penetrating the ground.

Increases in pollution in China and other areas of South and East Asia during the past 40 years, therefore, resulted in lower surface temperatures, while cuts in Europe led to clearer skies and hotter temperatures.

Temperature changes in different latitudes reduced vertical wind shear and therefore weakened the summertime Eurasian subtropical westerly jet - the ribbon of wind which extends east over Central Asia and northern China from the North Atlantic Jet Stream – by 7% over the period.

The researchers looked at the effect of greenhouse gases and pollution particles separately and found that the former causes a strengthening of the jet stream, but was overpowered by the impacts of air pollution.

Dr. Dong said: “As Southeast Asian countries fulfill commitments to cut their air pollution levels over the coming decades, we would expect to see the jet stream strengthen over Eurasia once again, potentially reducing the likelihood of prolonged heatwaves but increasing the likelihood of strong cyclones in mid-latitudes.”

UH researcher wins $2M grant to innovate drug discovery for breast cancer

With a $2 million recruitment grant from the Cancer Prevention and Research Institute of Texas (CPRIT), a University of Houston researcher is setting up a lab to develop drugs that will work on traditionally undruggable targets in cancer. Gül Zerze, assistant professor in the William A. Brookshire Department of Chemical and Biomolecular Engineering at the UH Cullen College of Engineering, is one of 12 cancer researchers recruited to Texas by CPRIT last November.  Gül Zerze, assistant professor in the William A. Brookshire Department of Chemical and Biomolecular Engineering at the University of Houston, is targeting formerly undruggable targets in breast cancer.

Zerze’s initial target is breast cancer. 

"One out of nearly six Texas women diagnosed with breast cancer will die of the disease. Importantly, Texan women of color are disproportionately impacted by the high mortality rate compared to white Texan women (41% higher mortality rate reported for Black Texan women in 2016). This high mortality rate, despite the substantial efforts made for early diagnosis, calls for better therapeutics urgently,” said Zerze, whose research will also be expanded more broadly to address other cancers. 

The CPRIT recruitment grants for the latest class, totaling $38 million, are meant to “form a critical ecosystem of distinguished cancer-fighting talent” in Texas. Zerze was persuaded to come to UH from Princeton University where she was a postdoctoral researcher specializing in computational modeling and simulations of a special class of proteins called intrinsically disordered proteins (IDPs).  

The vast majority (approximately 70%) of proteins implicated in human cancers are either IDPs or have large intrinsically disordered regions, and many of these targets are considered ‘undruggable’ due to the scarcity of high-resolution methods that can offer a fundamental understanding of them. 

“Computational and data science methodologies offer a promising avenue to fill in this gap to enable developing drugs against these traditionally undruggable targets,” said Zerze, whose methodology will include rapid screening.  

Despite the significant progress made in cancer treatment options in the last 20 years, many cancer targets have still yet to be drugged. Among those holding promise are transcription factors (TFs), which are proteins involved in converting (or transcribing) DNA into RNA. TFs contain large amounts of disordered proteins which participate in transcriptional condensates that form via liquid-like phase separation (LLPS).  

“Transcriptional condensates are shown to be aberrant in tumor cells, but the progress to develop drugs against TFs that participate in LLPS has been limited by the extremely dynamic nature of activation domains of TFs. We are developing a computational platform that will enable discovering drugs against these aberrant condensates by systematically interrogating the way transcription factors form, through the liquid-like phase separation of intrinsically disordered regions,” said Zerze. 

Through collaborations within the University and the MD Anderson Cancer Center, the drug candidates will be rapidly tested.  

“The ideas proposed here will save lives and the products that will come out of this project have a great potential for commercialization and founding companies to contribute to the Texas economy,” said Zerze.