Dartmouth deploys big data, AI tools to support research on harmful blue-green algae

Robotic boats and aerial drones combine with water sampling to study eastern lakes

A team of scientists from research centers stretching from Maine to South Carolina will develop and deploy high-tech tools to explore cyanobacteria in lakes across the East Coast.

The multi-year project will combine big data, artificial intelligence and robotics with new and time-tested techniques for lake sampling to understand where, when, and how cyanobacterial blooms develop.

The research team brings together experts in freshwater ecology, computer science, engineering and geospatial science from Bates College, Colby College, Dartmouth, the University of New Hampshire, the University of Rhode Island and the University of South Carolina. {module In-article}

"It is rare to have teams from so many different specialties converge to study a problem like this," said Alberto Quattrini Li, an assistant professor of computer science at Dartmouth and the overall project lead. "By working together, we can increase the amount of data that can be collected and increase prediction capabilities."

Freshwater lakes are responsible for a variety of human and ecological services, such as providing drinking water and producing food. But lakes across the country and world are increasingly threatened by an increase in the incidence of harmful cyanobacterial blooms.

Sometimes known as blue-green algae, blooms of cyanobacteria impact the quality of lake water and threaten human health through toxins that can damage multiple organ systems.

Scientists know that land-use changes and global climate change are the main drivers of cyanobacteria, but there is still much that is not known about what influences the timing and location of blooms in individual lakes. Researchers are also looking to understand how cyanobacteria are impacted by extreme precipitation events.

"We suspect that individual blooms result from a complicated interaction of conditions that include nutrient loading during the past spring, recent trends in temperature and precipitation, and current in-lake conditions," said Kathryn Cottingham, a professor of biology at Dartmouth. "Until now, we haven't had the tools or technologies to track conditions at the right spatial or temporal scales to understand those drivers."

The project will use robotic boats, buoys, and camera-equipped drones to measure physical, chemical, and biological data in lakes where cyanobacteria are detected. When combined, the technology will generate large volumes of data related to the lakes and the development of harmful blooms. The project will also build new algorithmic models to assess the findings.

Lakes in New Hampshire, Maine, Rhode Island, and South Carolina will be studied as part of the project.

Information collected through the research could lead to better predictions of when and where cyanobacterial blooms take place. Those predictions might allow earlier actions to protect public health in recreational lakes and in lakes that supply drinking water.

With technology covering the water and air, researchers will also collect information on population and land use around the lakes to determine how those factors might impact bloom formation.

Project technology will be shared with lake managers and citizens so that community members can conduct their own monitoring. Local homeowners will form a corps of "citizen scientists" to support the project.

Undergraduate and graduate students will also participate in the project. Such interdisciplinary training is hoped to prepare the next generation of scientists to address societal issues.

Russian mathematicians find gold in big data

Russian mathematicians and geophysicists have made a standard technique for ore prospecting several times more effective. Their findings are reported in Geophysical Journal International, one of the most respected scientific periodicals on computational geophysics.

The controlled-source electromagnetic method, known as CSEM, dates back to the mid-20th century. It involves deploying grounded electrodes that inject an oscillating electric current into the Earth. The electromagnetic field is then measured on the surface. The resulting data enable mapping the electrical resistivity of the subsurface rock by solving what is known as an inverse problem. This is useful because a low resistivity suggests the presence of metal ore. A considerable limitation of CSEM, which has restricted its scope of application, is its high demand for computing resources.

Now, a research group led by Michael Zhdanov from the Applied Computational Geophysics Lab at the Moscow Institute of Physics and Technology has created a numerical method that makes the calculations feasible for modern supercomputers. CREDIT @tsarcyanide/MIPT Press Office{module In-article}

"Solving the inverse problem involves calculating -- thousands of times -- the electromagnetic field from a given distribution of electric current," said paper co-author Mikhail Malovichko of Skoltech and the MIPT Applied Computational Geophysics Lab. "We have proposed a new numerical method that speeds up the forward-problem calculation on alternating current severalfold, thus making the inverse problem tractable on modern supercomputers."

However, to use the algorithm for prospecting, it first needs to be verified using precise data on real ore deposits. Highly reliable reference data are supplied by the most expensive geological prospecting technique there is -- exploration drilling.

Fortunately, such data turned out to be available on the Sukhoi Log gold deposit, 900 kilometers northeast of Irkutsk, Russia. Discovered in the 1960s, the deposit is one of the largest worldwide. That said, the precious metal concentration in the rock is fairly low. For this reason, Sukhoi Log was thoroughly scrutinized to enable extracting ore only where it is economically viable.

"The Soviet Union spent an immense amount of money to drill more than 800 boreholes in an endeavor, whose economic feasibility was not subject to any checks anyway," said study co-author Andrei Tarasov, who is an associate professor at the Department of Geophysics, St. Petersburg State University. "This makes Sukhoi Log the ideal place for testing newly developed geological surveying techniques by comparing their predictions with the precise data available from drilling."

By processing the large arrays of available data, the MIPT-Skoltech team created a detailed 3D map of the area and tested the new algorithm's ability to solve the inverse problem in CSEM. The new model enables prospectors to make do with as few exploratory holes as possible: The drilling is only employed to verify model predictions.

The technique developed by the Russian researchers is applicable for searching for other kinds of ores, including copper-nickel, volcanogenic massive sulfide, and polymetallic deposits.

AI, big data predict which research will influence future medical treatments

An artificial intelligence/machine learning model to predict which scientific advances are likely to eventually translate to the clinic has been developed by Ian Hutchins and colleagues in the Office of Portfolio Analysis (OPA), a team led by George Santangelo at the National Institutes of Health (NIH). This work, described in a Meta-Research article published October 10 in the open-access journal PLOS Biology, aims to decrease the sometimes decades-long interval between scientific discovery and clinical application; the method determines the likelihood that a research article will be cited by a future clinical trial or guideline, an early indicator of translational progress.

Hutchins and colleagues have quantified these predictions, which are highly accurate with as little as two years of post-publication data, as a novel metric called "Approximate Potential to Translate" (APT). APT values can be used by researchers and decision-makers to focus attention on areas of science that have strong signatures of translational potential. Although numbers alone should never be a substitute for evaluation by human experts, the APT metric has the potential to accelerate biomedical progress as one component of data-driven decision-making. CAPTION This image depicts the co-citation network of seminal fundamental publications that led to the clinical development of cancer immunotherapy treatments. Large dots (center) represent the most influential clinical trials that formed part of the evidence base for FDA approval of these treatments. Heat mapping indicates the extent to which the research was human-focused; at the extremes, each green dot represents a fundamental research publication and each red dot a publication describing human research. This network was generated using open access data from the new modules of the iCite webtool described in two new articles from Hutchins and colleagues.  CREDIT Ian Hutchins and George Santangelo{module In-article}

The model that computes APT values makes predictions based upon the content of research articles and the articles that cite them. A long-standing barrier to research and development of metrics like APT is that such citation data has remained hidden behind proprietary, restrictive, and often costly licensing agreements. To disrupt this impediment to the scientific community, to increase transparency, and to facilitate reproducibility, OPA has aggregated citation data from publicly available resources to create an open citation collection (NIH-OCC), the details of which appear in a Community Page article in the same issue of PLOS Biology. The NIH-OCC comprises over 420 million citation links at present and will be updated monthly as citations continue to accumulate. For publications since 2010, the NIH-OCC is already more comprehensive than leading proprietary sources of citation data.

Citation data from the NIH-OCC are used to calculate both APT values and Relative Citation Ratios (RCRs). The latter, a measure of scientific influence at the article level, normalized for the field of study and time since publication, was developed previously by Santangelo's team at NIH, and has already been widely adopted in both the scientific and evaluator communities. Upon publication of these two articles, APT values and the NIH-OCC will be freely and publicly available as new components of the iCite webtool that will continue as the primary source of RCR data (https://icite.od.nih.gov/). The OPA team encourages the use of iCite to improve research assessment and decision-making that can contribute to optimizing the scientific enterprise.