Galactic RainCloudS supercomputing in the cloud unveils the enigmas of our galaxy

The Galactic RainCloudS project, an initiative led by members of the Faculty of Physics, the Institute of Cosmos Sciences (ICCUB) of the UB, and the Institute for Space Studies of Catalonia (IEEC), was awarded the first position in the framework of the Cloud Funding for Research call of the European project Open Clouds For Research Environments (OCRE). 

The project competed against 27 proposals from twelve countries in a wide range of research disciplines. This first edition of Cloud Funding For Research funds the use of commercial computational cloud resources for research. The project counts on the collaboration from the private sector, and specifically from Pervasive Technologies, which brings experience in artificial intelligence and cloud supercomputing; Google, and the computing infrastructure from Google Cloud and Telefónica, which offers experience in cloud resource management. Galactic RainCloudS is a pioneering project in Europe in the use of computational infrastructures on the cloud for research on astronomy. Image: ESA/Hubble & NASA, V. Antoniou

Professor Xavier Luri, director of ICCUB and principal researcher of the project, highlights that “The Galactic RainCloudS project is a pioneer one in Europe in the use of commercial cloud infrastructures for astronomy research, and results from the will to show the benefits of the the cloud resource uses for the scientific community”.

The key to the project lies in interdisciplinarity: combining the extraordinary volumes of data from the European Space Agency Gaia Satellite with the great computational power and the flexibility of cloud infrastructures, and the data mining techniques, will enable the team of the University of Barcelona to study the existing links between past galaxy collisions and star formation in a holistic way, a study in which the Milky Way and satellite galaxies will be an experimental laboratory. “Cloud computing is like renting powerful customized computers, for a certain period, which will enable us to make the necessary calculations to study the interaction between galaxies”, notes Mercè Romero, a researcher at ICCUB.

The project also includes the development of a system to detect traces of past small galaxy collisions with the halo of our galaxy. Teresa Antoja, a researcher at ICCUB, notes that “the existence of granularities in the galactic halos is a prediction of the current cosmological model of the formation of our Universe: the active search for substructures of this type in the Gaia data can provide vital information on the history of the Milky Way and the nature of dark matter”.

Artificial intelligence and cloud supercomputing

The participation of the private sector in this project shows the closeness between research and companies in the use of cutting-edge technologies as well as their shared interests. “In Pervasive Technologies, we are glad to offer our knowledge on artificial intelligence and cloud computing to a pioneer project in the field of research. We will work to get the highest performance of the cloud infrastructures and artificial intelligence for this project”, notes Rodolfo Lomascolo, CEO of Pervasive Technologies.

To be successful, the Galactic RainCloudS project must have, among other features, big data infrastructures. “The Gaia satellite data hide the answer to many questions we want to solve, but we need the right tools to retrieve them”, notes Roger Mor, data scientist at Pervasive Technologies and ICCUB collaborator. He adds: “The available big data platforms in the commercial cloud and artificial intelligence services are fundamental tools to find, for instance, whether the interaction of Sagittarius with the Milky Way caused the reignition of the star formation in our galaxy between 5 and 7billion years ago, as stated in some studies”.

Enrique González Lezana, head of cloud sales specialist at Telefónica Tech, says that “Telefónica has accompanied the University of Barcelona in the definition and unfolding of the Google Cloud architecture, where the required hypercomputing solution to work on the Galactic RainCloudS project will be hosted”. “The unfolded infrastructure —he adds— will enable the processing and analysis of big data in a flexible, scalable way, adjusted to the required needs of the researchers of the University of Barcelona. Telefónica will work with the UB during the entire process to guarantee the successful implementation of the project with teams specialized in Google Cloud services and technologies”.

The project launched this May and will last a year. “Galactic RainCLoudS is a necessary step in the transition of the world of research toward the efficient use of cloud computing resources. In this sense, we are pioneers in its use at the University of Barcelona and we hope our experience serves to encourage its use. The research teams’ needs are becoming more specific, and we are making an effort for this project to open the doors of commercial cloud computing in future projects for all research disciplines”, concludes Xavier Luri.

UTSA, UCCS researchers team up to identify methods to predict future cyberattacks

Malicious software activities, commonly known as “malware,” represent a big threat to modern society.

A UTSA-led research team is investigating ways to accurately predict these attacks. Mechanical Engineering Professor Yusheng Feng and doctoral student Van Trieu-Do in the Margie and Bill Klesse College of Engineering and Integrated Design, in collaboration with professor Shouhuai Xu from the Department of Computer Science at the University of Colorado at Colorado Springs, are studying how to use mathematical tools and supercomputer simulation to foresee cyberattacks.

According to the recent findings by the Atlas VPN team, blockchain hackers netted nearly $1.3 billion in 78 hack events throughout Q1 2022. In addition, hacks on Ethereum and Solana's ecosystems attributed to over $1 billion in losses alone during this quarter.

The current pervasive security threats motivated the UTSA researchers to develop and use cyber defense tools and sensors to monitor the threats and collect data, which can be used for various purposes in developing defense mechanisms.

“The current damages call for studies to understand and characterize cyber attacks from different perspectives and at various levels of intrusion. There are multiple variables that go into predicting the potential damage these attacks may cause as the aggressors get more sophisticated,” said Feng.

Using predictive situational awareness analysis, the team studied the distinctive nature of the attacks to accurately predict the threats that target and potentially harm personal devices, servers, and networks.

“Most studies on cyberattacks focus on microscopic levels of abstractions, meaning how to defend against a particular attack,” Feng said. “Cyber attackers can successfully break in by exploiting a single weakness in a computer system.”

The study aims to analyze the macroscopic levels of abstractions.

“Such macroscopic-level studies are important because they would offer insights towards holistic solutions to defending cyberattacks,” he added.

Feng explains, “It’s very hard to single out the cause of each attack, however, we have big data with time series for each IP address (location). In this research, we use ‘causality’ when there are inter-relationships among IP addresses that have similar patterns of temporal features for identifying the threat.”

The researchers utilized Granger causality (G-causality) to study the vulnerabilities from a regional perspective of multiple threats, analyzing the cause and effect to identify cyber vulnerabilities or how the infiltrators attack an entity, in this case, IP addresses.

G-causality is a statistical concept of causation that is based on prediction, to characterize causality, a well-defined mathematical notion has to be established. The research team used Granger causality to determine the nature of the cyberattack signals so the signals can be compared and analyzed holistically.

The team also plans to expand the current body of research and study further what other kinds of causality will impact users and how to develop the appropriate defense tools to protect against sophisticated attacks.

Japanese scientists develop a statistical randomness-based framework for processing big datasets efficiently with memory limit

Any high-performance computing should be able to handle a vast amount of data in a short amount of time, an important aspect on which entire fields are based. Usually, the first step to managing a large amount of data is to either classify it based on well-defined attributes or--as is typical in machine learning "cluster" them into groups such that data points in the same group are more similar to one another than to those in another group. However, for an extremely large dataset, which can have trillions of sample points, it is tedious to even group data points into a single cluster without huge memory requirements.

"The problem can be formulated as follows: Suppose we have a clustering tool that can process up to lmax samples. The tool classifies l (input) samples into M(l) groups (as output) based on some attributes. Let the actual number of samples be L and G = M(L) be the total number of attributes we want to find. The problem is that if L is much larger than lmax, we cannot determine G owing to limitations in memory capacity," explains Professor Ryo Maezono from the Japan Advanced Institute of Science and Technology (JAIST) in Ishikawa, Japan, who specializes in computational condensed matter theory.

Interestingly enough, very large sample sizes are common in materials science, where calculations involving atomic substitutions in a crystal structure often involve possibilities ranging in trillions! However, a mathematical theorem called "Polya's theorem," which utilizes the symmetry of the crystal often simplifies the calculations to a great extent. Unfortunately, Polya's theorem only works for problems with symmetry and is, therefore, of limited scope.

In a recent study published in Advanced Theory and Simulations, a team of scientists led by Prof. Maezono and his colleague, Keishu Utimula, Ph.D. in material science from JAIST (In 2020) and first author of the study, proposed an approach based on statistical randomness to identify G for sample sizes much larger (~ trillion) than lmax. The idea, essentially, is to pick a sample of size l that is much smaller than L, identify M(l) using machine learning "clustering," and repeat the process by varying l. As l increases, the estimated M(l) converges to M(L) or G, provided G is considerably smaller than lmax (which is almost always satisfied). However, this is still a computationally expensive strategy, because it is tricky to know exactly when convergence has been achieved.

To address this issue, the scientists implemented another ingenious strategy: they made use of the "variance", or the degree of spread, in M(l). From simple mathematical reasoning, they showed that the variance of M(l), or V[M(l)], should have a peak for a sample size ~ G. In other words, the sample size corresponding to a maximum in V[M(l)] is approximately G! Furthermore, numerical simulations revealed that the peak variance itself scaled as 0.1 times G, and was thus a good estimate of G.

While the results are yet to be mathematically verified, the technique shows promise of finding applications in high-performance computing and machine learning. "The method described in our work has much wider applicability than Polya's theorem and can, therefore, handle a broader category of problems. Moreover, it only requires a machine learning clustering tool for sorting the data and does not require a large memory or whole sampling. This can make AI recognition technology feasible for larger data sizes even with small-scale recognition tools, which can improve their convenience and availability in the future," comments Prof. Maezono excitedly.

Sometimes, statistics is nothing short of magic, and this study proves that!