Rochester researchers develop a counterfactual method to verify predictions of drug safety

Scientists rely increasingly on models trained with machine learning to provide solutions to complex problems. But how do we know the solutions are trustworthy when the complex algorithms the models use are not easily interrogated or able to explain their decisions to humans?

That trust is especially crucial in drug discovery, for example, where machine learning is used to sort through millions of potentially toxic compounds to determine which might be safe candidates for pharmaceutical drugs. From left: PhD student Geemi Wellawatte, Andrew White, an associate professor of chemical engineering, and Aditi Seshadri ’22 in Wegmans Hall. White’s lab has developed a way to verify the predictions of machine learning models used in drug discovery by using counterfactuals. (University of Rochester photo / J. Adam Fenster)

“There have been some high-profile accidents in computer science where a model could predict things quite well, but the predictions weren’t based on anything meaningful,” says Andrew White associate professor of chemical engineering at the University of Rochester.

White and his lab have developed a new “counterfactual” method that can be used with any molecular structure-based machine learning model to better understand how the model arrived at a conclusion.

Counterfactuals can tell researchers “the smallest change to the features that would alter the prediction,” says lead author Geemi Wellawatte, a Ph.D. student in White’s lab. “In other words, a counterfactual is an example as close to the original, but with a different outcome.”

Counterfactuals can help researchers quickly pinpoint why a model made a prediction, and whether it is valid.

The research identifies three examples of how the new method, called MMACE (Molecular Model Agonistic Counterfactual Explanations), can be used to explain why:

  • a molecule is predicted to permeate the blood-brain barrier
  • a small molecule is predicted to be soluble
  • a molecule is predicted to inhibit HIVs

The lab had to overcome some major challenges in developing MMACE. They needed a method that could be adapted for the wide array of machine-learning methods that are used in chemistry. In addition, searching for the most-similar molecule for any given scenario was also challenging because of the sheer number of possible candidate molecules.

Aditi Seshadri in White’s lab helped solve that problem by suggesting the group adapt the STONED (Superfast traversal, optimization, novelty, exploration, and discovery) algorithm developed at the University of Toronto. STONED efficiently generates similar molecules, the fuel for counterfactual generation. Seshadri is an undergraduate researcher in White’s lab and was able to help on the project via a Rochester summer research program called “Discover.”

White says his team is continuing to improve MMACE, by trying other databases in their search for most similar molecules, for example, and refining the definition of molecular similarity.

UConn prof solves a mystery of massive black holes, quasars with supercomputer simulations

A discovery that provides new insight into how galaxies evolve

At the center of galaxies, like our own Milky Way, lies massive black holes surrounded by spinning gas. Some shine brightly, with a continuous supply of fuel, while others go dormant for millions of years, only to reawaken with a serendipitous influx of gas. It remains largely a mystery how gas flows across the universe to feed these massive black holes.

UConn Assistant Professor of Physics Daniel Anglés-Alcázar, lead author on a paper published today in The Astrophysical Journal, addresses some of the questions surrounding these massive and enigmatic features of the universe by using new, high-powered supercomputer simulations. 

“Supermassive black holes play a key role in galaxy evolution and we are trying to understand how they grow at the centers of galaxies,” says Anglés-Alcázar. “This is very important not just because black holes are very interesting objects on their own, as sources of gravitational waves and all sorts of interesting stuff, but also because we need to understand what the central black holes are doing if we want to understand how galaxies evolve.” Distribution of gas across scales, with the gas density increasing from purple to yellow. The top left panel shows a large region containing tens of galaxies (6 million light-years across). Subsequent panels zoom in progressively into the nuclear region of the most massive galaxy and down to the vicinity of the central supermassive black hole. Gas clumps and filaments fall from the inner edge of the central cavity occasionally feeding the black hole.

Anglés-Alcázar, who is also an Associate Research Scientist at the Flatiron Institute Center for Computational Astrophysics, says a challenge in answering these questions has been creating models powerful enough to account for the numerous forces and factors that play into the process. Previous works have looked either at very large scales or the very smallest of scales, “but it has been a challenge to study the full range of scales connected simultaneously.”

Galaxy formation, Anglés-Alcázar says, starts with a halo of dark matter that dominates the mass and gravitational potential in the area and begins pulling in gas from its surroundings. Stars form from the dense gas, but some of it must reach the center of the galaxy to feed the black hole. How does all that gas get there? For some black holes, this involves huge quantities of gas, the equivalent of ten times the mass of the sun or more swallowed in just one year, says Anglés-Alcázar.

“When supermassive black holes are growing very fast, we refer to them as quasars,” he says. “They can have a mass well into one billion times the mass of the sun and can outshine everything else in the galaxy. How quasars look depends on how much gas they add per unit of time. How do we manage to get so much gas down to the center of the galaxy and close enough that the black hole can grab it and grow from there?”

The new simulations provide key insights into the nature of quasars, showing that strong gravitational forces from stars can twist and destabilize the gas across scales, and drive sufficient gas influx to power a luminous quasar at the epoch of peak galaxy activity.

In visualizing this series of events, it is easy to see the complexities of modeling them, and Anglés-Alcázar says it is necessary to account for the myriad components influencing black hole evolution.

“Our simulations incorporate many of the key physical processes, for example, the hydrodynamics of gas and how it evolves under the influence of pressure forces, gravity, and feedback from massive stars. Powerful events such as supernovae inject a lot of energy into the surrounding medium and this influences how the galaxy evolves, so we need to incorporate all of these details and physical processes to capture an accurate picture.”

Building on previous work from the FIRE (“Feedback In Realistic Environments”) project, Anglés-Alcázar explains the new technique outlined in the paper that greatly increases model resolution and allows for following the gas as it flows across the galaxy with more than a thousand times better resolution than previously possible,

“Other models can tell you a lot of details about what’s happening very close to the black hole, but they don’t contain information about what the rest of the galaxy is doing, or even less, what the environment around the galaxy is doing. It turns out, it is very important to connect all of these processes at the same time, this is where this new study comes in.”

The supercomputing power is similarly massive, Anglés-Alcázar says, with hundreds of central processing units (CPUs) running in parallel that could have easily taken the length of millions of CPU hours.

“This is the first time that we have been able to create a simulation that can capture the full range of scales in a single model and where we can watch how gas is flowing from very large scales all the way down to the very center of the massive galaxy that we are focusing on.”

For future studies of large statistical populations of galaxies and massive black holes, we need to understand the full picture and the dominant physical mechanisms for as many different conditions as possible, says Anglés-Alcázar.

“That is something we are definitely excited about. This is just the beginning of exploring all of these different processes that explain how black holes can form and grow under different regimes.”

Pluto launches a cloud-based life sciences data analysis, management platform to speed discovery

The startup’s cloud-based collaboration platform combines intuitive design with powerful bioinformatics analysis for life sciences researchers

The Wyss Institute for Biologically Inspired Engineering at Harvard University has announced that a new startup, Pluto Biosciences (Pluto), has launched to commercialize a cloud-based life sciences data management platform that incorporates technology developed within the Institute’s Predictive Bioanalytics Initiative. The platform combines intuitive design with native biological data storage and analysis within a collaborative online environment. With a license from Harvard’s Office of Technology Development (OTD), Pluto plans to make the technology more broadly available to researchers in academia and biotech, offering an interactive home for all their lab data online.

The platform is the brainchild of former Wyss Senior Staff Scientist Rani Powers, Ph.D., who is now the founder and CEO of Pluto. After leading several successful product launches at life sciences and software startup companies and working with teams of scientists at the Wyss Institute, she was inspired to create an entirely new kind of platform that makes biological data analysis accessible to research labs across the globe.

“Our mission is to empower every researcher with their own digital lab space,” said Powers. “Whether you’re a PI trying to summarize and report experimental results for a grant, a scientist in pharma looking to compare data against published experiments, or a grad student trying to share results with a collaborator, the platform we built at the Wyss Institute allows you to manage and analyze your projects in one place, speeding up science and reducing busywork.”

Speeding up the pace of discovery

A molecular biologist by training, Powers has also been developing software for over a decade. Her experiences both at the lab bench and with complex data sets revealed that scientists across many life sciences organizations struggle to manage, analyze, and share their experimental data due to complicated, incompatible software programs.

“At tech companies, we encounter technical challenges related to data storage and user experience every day. These aren’t easy problems, but they’re addressable with a combination of engineering and design, and solving them is crucial for creating products that users love. So I wondered, why we weren’t applying this approach to the software we use for science?” said Powers.

When the COVID-19 pandemic struck, Powers saw first-hand how a platform like the one she envisioned could dramatically speed up the process of innovation. “Seeing so many people collaborating at a large scale to solve a pressing scientific problem was incredible - the Wyss Institute’s innovation machine was firing on all cylinders. But because we were working with collaborators at the University of Maryland and Mount Sinai, a lot of time was spent emailing people back and forth to coordinate a plan for generating, storing, and analyzing results. If we’d had a centralized collaboration platform housing everyone’s experimental data and results, our ability to respond to the pandemic would have been even greater,” said Powers.

The idea turned into a five-month project known internally as OrbitSeq that was launched in February 2021 and became the first technology development project under the Wyss Institute’s Predictive BioAnalytics Initiative. In addition to working directly with Wyss scientists to learn their pain points and design the platform to address them, she was also able to leverage the Wyss Institute’s ecosystem of collaborators and contacts to gain external perspectives and refine the strategy for ultimately spinning the project out as a startup.

Simple design, sophisticated science

Powers knew that it was crucial to design usability into the platform from the beginning. “Although cloud-based bioinformatics and data analysis tools exist for scientists to use, their interfaces are unnecessarily complicated to navigate. In fact, a bioinformatics director at one university told me that he was always having to ‘learn which buttons to ignore’ in these tools to get the result that he wanted. Popular consumer apps have demonstrated that it’s possible to combine simple design with powerful computation, so that was the driving vision for the platform,” she said. 

The new platform’s drag-and-drop interface allows users to upload raw low- or high-throughput assay data and perform different types of analyses to generate visualizations ranging from simple bar plots to more detailed volcano plots and heat maps, which can be difficult for people without specific bioinformatics or coding knowledge to create today. More importantly, plots and other results are stored securely in the cloud alongside the raw data used to generate them, so they are fully reproducible, shareable, and always accessible when needed.

But ease-of-use and simple design were only half of the equation. To provide maximum utility, the platform also interfaces with third-party bioinformatics and next-generation sequencing services, eliminating the need for large data transfer systems and other common time-consuming obstacles in the path of scientific discovery and innovation. Beyond an individual lab’s data, the platform includes access to thousands of publicly available experiments for easy comparison, allowing researchers to learn not only about their own results but also about the broader implications of their work.

Early success, long-term potential

Pluto was launched in July 2021, making Powers a member of the Wyss’ Lumineers Class of 2021. Over the next few months, Pluto plans to expand the list of organisms, experiment types, and analysis modules its product supports, including launching new features for predictive analysis and biomarker identification. The team is also implementing algorithms for comparisons across datasets. Aiming to make an impact across a variety of life sciences organizations, they believe Pluto has the potential to speed up the scientific process for researchers around the globe.

“The platform that Rani and her team have built isn’t just another bioinformatics tool – it’s a collaborative online ecosystem that allows scientists to share and build on each other’s advances more quickly and easily in a seamless way that really hasn’t been done for lab-based researchers before,” said the Wyss Institute’s Founding Director Don Ingber, M.D., Ph.D. “The Wyss Institute is proud to have contributed to this important advance in the quest to make data analysis and management more accessible for scientists of all kinds, and we are excited to see Pluto launched so that others can benefit from this new technology.” Ingber is also the Judah Folkman Professor of Vascular Biology at Harvard Medical School and Boston Children’s Hospital, and Professor of Bioengineering at the Harvard John A. Paulson School of Engineering and Applied Sciences.