HKU scientists develop a deep learning approach to predict disease-associated mutations of the metal-binding sites

During the past years, artificial intelligence (AI) has become a key player in high-techs like drug development projects. AI tools help scientists to uncover the secret behind the big biological data using optimized computational algorithms. AI methods such as deep neural networks improve decision making in biological and chemical applications i.e., prediction of disease-associated proteins, the discovery of novel biomarkers and de novo design of small molecule drug leads. These state-of-the-art approaches help scientists to develop a potential drug more efficiently and economically. The research team (from left) Dr Haibo Wang, Dr Mohamad Koohi-Moghadam, Professor Hongzhe Sun and Dr Hongyan Li at the Department of Chemistry, HKU

A research team led by Professor Hongzhe Sun from the Department of Chemistry at the University of Hong Kong (HKU), in collaboration with Professor Junwen Wang from Mayo Clinic, Arizona in the United States (a former HKU colleague), implemented a robust deep learning approach to predict disease-associated mutations of the metal-binding sites in a protein. This is the first deep learning approach for the prediction of disease-associated metal-relevant site mutations in metalloproteins, providing a new platform to tackle human diseases. The research findings were recently published in an educational journal. {module INSIDE STORY}

Metal ions play pivotal roles either structurally or functionally in the (patho)physiology of human biological systems. Metals such as zinc, iron, and copper are essential for all lives and their concentration in cells must be strictly regulated. A deficiency or an excess of these physiological metal ions can cause severe disease in humans. It was discovered that a mutation in human genome are strongly associated with different diseases. If these mutations happen in the coding region of DNA, it might disrupt metal-binding sites of the proteins and consequently initiate severe diseases in humans. Understanding of disease-associated mutations at the metal-binding sites of proteins will facilitate the discovery of new drugs.

The team first integrated omics data from different databases to build a comprehensive training dataset. By looking at the statistics from the collected data, the team found that different metals have different disease associations. A mutation in zinc-binding sites has a major role in breast, liver, kidney, immune system and prostate diseases. By contrast, the mutations in calcium- and magnesium-binding sites are associated with muscular and immune system diseases, respectively. For iron-binding sites, mutations are more associated with metabolic diseases. Furthermore, mutations of manganese- and copper-binding sites are associated with cardiovascular diseases with the latter being associated with nervous system disease as well. They used a novel approach to extract spatial features from the metal-binding sites using an energy-based affinity grid map. These spatial features have been merged with physicochemical sequential features to train the model. The final results show using the spatial features enhanced the performance of the prediction with an area under the curve (AUC) of 0.90 and an accuracy of 0.82. Given the limited advanced techniques and platforms in the field of metallomics and metalloproteins, the proposed deep learning approach offers a method to integrate the experimental data with bioinformatics analysis. The approach will help scientist to predict DNA mutations which are associated with a disease like cancer, cardiovascular diseases, and genetic disorders. Workflow of data collection and feature extraction to train the deep learning model.{module INSIDE STORY}

Professor Sun said: "Machine learning and AI play important roles in the current biological and chemical science. In my group, we worked on metals in biology and medicine using an integrative omics approach including metallomics and metalloproteomics, and we already produced a large amount of valuable data using in vivo/vitro experiments. We now develop an artificial intelligence approach based on deep learning to turn these raw data into valuable knowledge, leading to uncover secrets behind the diseases and to fight with them. I believe this novel deep learning approach can be used in other projects, which is undergoing in our laboratory."

Where do baby sea turtles go? New supercomputing technique may provide answers

The international group of scientists worked on the model, which will give communities, scientists and government agencies a tool to help conserve sea turtles.

A team of Florida researchers and their collaborators created a first-of-its-kind supercomputer model that tracks where sea turtle hatchlings go after they leave Florida’s shores, giving scientists a new tool to figure out where young turtles spend their “lost years.”

Nathan Putman, a biologist with LGL Ecological Research Assoc. based in Texas, led the study, which included 22 collaborators across Mexico, the southeastern United States, the Caribbean, and Europe. Co-authors include UCF Associate Professor Kate Mansfield, who leads UCF’s Marine Turtle Research Group, and UCF assistant research scientist Erin Seney.

“The model gives community groups, scientists, nonprofit agencies and governments across borders a tool to help inform conservation efforts and guide policies to protect sea turtle species and balance the needs of fisheries and other human activity,” Putman said. The team’s simulation model and findings were published this week in the online journal Ecography.

The model is built to predict loggerhead, green turtle and Kemp’s ridley abundance, according to the authors. To create the model, the team looked at ocean circulation data over the past 30 years. These data are known to be reliable and routinely used by the National Ocean and Atmospheric Administration and other agencies. The team also used sea turtle nesting and stranding data from various sources along the Caribbean, Gulf of Mexico and Florida coasts. The dataset includes more than 30 years of information from UCF, which has been monitoring sea turtle nests in east Central Florida since the late 1970s. Mansfield, Seney, and Putman previously worked together on other sea turtle studies in the Gulf of Mexico. A sea turtle hatchling heads to the ocean. CREDIT: G.Stahelin{module INSIDE STORY}

“The combination of big data is what made this supercomputer model so robust, reliable and powerful,” Putman said.

The group used U.S. and Mexico stranding data—information about where sea turtles washed ashore for a variety of reasons—to check if the supercomputer model was accurate, Putman said. The model also accounts for hurricanes and their impact on the ocean, but it does not take into consideration manmade threats such as the 2010 Deepwater Horizon oil spill in the Gulf of Mexico, which occurred during the years analyzed in the study.

The supercomputer model also predicts where the turtles go during their “lost years” – a period after the turtles break free from their eggs on the shoreline and head into the ocean in the Gulf of Mexico and northwest Atlantic. The turtles spend years among sargassum in the ocean, and any data about that time is scarce. Better data exist when they are larger juveniles and return to forage closer to coastlines. What young sea turtles do in between hatching and returned to nearshore waters takes place during what is called the “lost years” and is the foundation of sea turtle populations. Understanding where and when the youngest sea turtles go is critical to understanding the threats these young turtles may encounter, and for better-predicting population trends throughout the long lives of these species, said Mansfield.

This work was supported in part by a National Academy of Sciences gulf research program grant awarded to Mansfield, Seney and Putman to synthesize available sea turtle datasets across the Gulf of Mexico.

“While localized data collection and research projects are important for understanding species’ biology, health and ecology, the turtles studied in one location typically spend different parts of their lives in other places, including migrations from offshore to inshore waters, from juvenile to adult foraging grounds, and between foraging and nesting areas,” said Seney, who helped coordinate data compilation from the multiple locations. “Our extensive collaborations on this project allowed us to study the Gulf of Mexico’s three most abundant sea turtle species and to integrate nesting beach data for distant nesting populations that ended up having close connections to the 1- to 3-year-old turtles living and stranding along various portions of the U.S. Gulf Coast. Without the involvement of our Mexican and Costa Rican collaborators, a big piece of this picture would have been missing.”

This work was funded with support from the Gulf Research Program of the National Academy of Sciences under award number 2000006434 and from a Florida RESTORE Act Centers of Excellence Grant through the Florida Institute of Oceanography under sub‐agreement no. 4710‐1126‐00‐H. The content is the sole responsibility of the authors and does not necessarily reflect the views of the Gulf Research Program, the National Academy of Sciences or the Florida Institute of Oceanography.

German built artificial intelligence tracks down leukemia

Largest metastudy to date on acute myeloid leukemia

Artificial intelligence can detect one of the most common forms of blood cancer - acute myeloid leukemia (AML) - with high reliability. Researchers at the German Center for Neurodegenerative Diseases (DZNE) and the University of Bonn have now shown this in a proof-of-concept study. Their approach is based on the analysis of the gene activity of cells found in the blood. Used in practice, this approach could support conventional diagnostics and possibly accelerate the beginning of therapy. The research results have been published in the journal "iScience."

Artificial intelligence is a much-discussed topic in medicine, especially in the field of diagnostics. "We aimed to investigate the potential on the basis of a specific example," explains Prof. Joachim Schultze, a research group leader at the DZNE and head of the Department for Genomics and Immunoregulation at the LIMES Institute of the University of Bonn. "Because this requires large amounts of data, we evaluated data on the gene activity of blood cells. Numerous studies have been carried out on this topic and the results are available through databases. Thus, there is an enormous data pool. We have collected virtually everything that is currently available." {module INSIDE STORY}

Fingerprint of Gene Activity

Schultze and his colleagues focused on the "transcriptome", which is a kind of fingerprint of gene activity. In each and every cell, depending on its condition, only certain genes are actually "switched on", which is reflected in their profiles of gene activity. Exactly such data - derived from cells in blood samples and spanning many thousands of genes - were analysed in the current study. "The transcriptome holds important information about the condition of cells. However, classical diagnostics is based on different data. We therefore wanted to find out what an analysis of the transcriptome can achieve using artificial intelligence, that is to say trainable algorithms," said Schultze, who is member of the Bonn-based "ImmunoSensation" cluster of excellence. "In the long term, we intend to apply this approach to further topics, in particular in the field of dementia."

The current study focused on AML. Without adequate treatment, this form of leukemia leads to death within weeks. AML is associated with the proliferation of pathologically altered bone marrow cells, which can ultimately enter the bloodstream. Ultimately both healthy cells and tumor cells drift in the blood. All these cells exhibit typical gene activity patterns, which were all considered in the analysis. Data from more than 12,000 blood samples - these came from 105 different studies - were taken into account: the largest dataset to date for a metastudy on AML. Approximately 4,100 of these blood samples derived from individuals diagnosed with AML, the remaining ones had been taken from individuals with other diseases or from healthy individuals.

High Hit Rate

The scientists fed their algorithms parts of this data set. The input included information about whether a sample came from an AML patient or not. "The algorithms then searched the transcriptome for disease-specific patterns. This is a largely automated process. It's called machine learning," said Schultze. Based on this pattern recognition, further data was analysed and classified by the algorithms, i.e. categorized into samples with AML and without AML. "Of course, we knew the classification as it was listed in the original data, but the software did not. We then checked the hit rate. It was above 99 percent for some of the applied methods. In fact, we tested various methods from the repertoire of machine learning and artificial intelligence. There was actually one algorithm that was particularly good, but the others were close behind."

Application in Practice?

Put into application, this method could support conventional diagnostics and help save costs, said Schultze. "In principle, a blood sample taken by the family doctor and sent to a laboratory for analysis could suffice. I guess that the cost would be less than 50 euros." Classical AML diagnostics includes a variety of methods. Some of these cost a few hundred euros per run, Schultze noted. "However, we have not yet developed a workable test. We have only shown that the approach works in principle. So we have laid the groundwork for developing a test."

Schultze emphasised that the diagnosis of AML will continue to require specialised physicians in the future. "The aim is to provide the experts with a tool that supports them in their diagnosis. In addition, many patients go through a real odyssey until they finally end up with a specialist and get a diagnosis." Because in the early stages the symptoms of AML can resemble those of a bad cold. However, AML is a life-threatening disease that should be treated as quickly as possible. "With a blood test, as it seems possible on the basis of our study, it is conceivable that the family doctor would already clarify a suspicion of AML. And when the suspicion is confirmed, the patient is referred to a specialist. Possibly, the diagnosis would then happen earlier than it does now and therapy could start earlier."