CXL-Based memory disaggregation technology opens up a new direction for big data solution frameworks

CXL solution developed by the Computer Architecture and Memory System Laboratory at KAIST.A KAIST team compute express link (CXL) provides new insights on memory disaggregation and ensures direct access and high-performance capabilities

A team from the Computer Architecture and Memory Systems Laboratory (CAMEL) at KAIST presented a new compute express link (CXL) solution whose directly accessible, and high-performance memory disaggregation opens new directions for big data memory processing. Professor Myoungsoo Jung said the team’s technology significantly improves performance compared to existing remote direct memory access (RDMA)-based memory disaggregation.

CXL is a peripheral component interconnect-express (PCIe)-based new dynamic multi-protocol made for efficiently utilizing memory devices and accelerators. Many enterprise data centers and memory vendors are paying attention to it as the next-generation multi-protocol for the era of big data.  

Emerging big data applications such as machine learning, graph analytics, and in-memory databases require large memory capacities. However, scaling out the memory capacity via a prior memory interface like double data rate (DDR) is limited by the number of the central processing units (CPUs) and memory controllers. Therefore, memory disaggregation, which allows connecting a host to another host’s memory or memory nodes, has appeared.

RDMA is a way that a host can directly access another host’s memory via InfiniBand, the commonly used network protocol in data centers. Nowadays, most existing memory disaggregation technologies employ RDMA to get a large memory capacity. As a result, a host can share another host’s memory by transferring the data between local and remote memory.  Figure 1. a comparison of the architecture between CAMEL’s CXL solution and conventional RDMA-based memory disaggregation.

Although RDMA-based memory disaggregation provides a large memory capacity to a host, two critical problems exist. First, scaling out the memory still needs an extra CPU to be added. Since passive memory such as dynamic random-access memory (DRAM), cannot operate by itself, it should be controlled by the CPU. Second, redundant data copies and software fabric interventions for RDMA-based memory disaggregation cause longer access latency. For example, remote memory access latency in RDMA-based memory disaggregation is multiple orders of magnitude longer than local memory access.

To address these issues, Professor Jung’s team developed the CXL-based memory disaggregation framework, including CXL-enabled customized CPUs, CXL devices, CXL switches, and CXL-aware operating system modules. The team’s CXL device is a pure passive and directly accessible memory node that contains multiple DRAM dual inline memory modules (DIMMs) and a CXL memory controller. Since the CXL memory controller supports the memory in the CXL device, a host can utilize the memory node without processor or software intervention. The team’s CXL switch enables scaling out a host’s memory capacity by hierarchically connecting multiple CXL devices to the CXL switch allowing more than hundreds of devices. Atop the switches and devices, the team’s CXL-enabled operating system removes redundant data copy and protocol conversion exhibited by conventional RDMA, which can significantly decrease access latency to the memory nodes.

In a test comparing loading 64B (cache line) data from memory pooling devices, CXL-based memory disaggregation showed 8.2 times higher data load performance than RDMA-based memory disaggregation and even similar performance to local DRAM memory. In the team’s evaluations for a big data benchmark such as a machine learning-based test, CXL-based memory disaggregation technology also showed a maximum of 3.7 times higher performance than prior RDMA-based memory disaggregation technologies. Figure 2. A performance comparison between CAMEL’s CXL solution and prior RDMA-based disaggregation.

“Escaping from the conventional RDMA-based memory disaggregation, our CXL-based memory disaggregation framework can provide high scalability and performance for diverse datacenters and cloud service infrastructures,” said Professor Jung. He went on to stress, “Our CXL-based memory disaggregation research will bring about a new paradigm for memory solutions that will lead the era of big data.” 

Chinese built divide, conquer algorithm offers a promising route for big data analysis

We live in the era of big data. The huge volume of information we generate daily has major applications in various fields of science and technology, economy, and management. For example, more and more companies now collect, store and analyze large-scale data sets from multiple sources to gain business insights or measure risk.

However, as Prof. Yong Zhou, one of the authors of a new study notes: “Typically, these large or massive data sets cannot be processed with independent computers, which poses new challenges for traditional data analysis in terms of computational methods and statistical theory.”

Together with colleagues at the Chinese University of Hong Kong, Zhou, a professor at China’s East China Normal University, has developed a new algorithm that promises to address these computational problems.

He explains: “State-of-the-art numerical algorithms already exist, such as optimal subsampling algorithms and divide and conquer algorithms. In contrast to the optimal subsampling algorithm, which samples small-scale, informative data points, the divide and conquer algorithm divides large data sets randomly into sub-datasets and processes them separately on multiple machines. While the divide and conquer method is effective in using computational resources to provide a big data analysis, a robust and efficient meta-method is usually required when integrating the results.”

In this study, the researchers have focused on the large-scale inference of a linear expectile regression model, which has wide applications in risk management. They propose a communication-effective, divide and conquer algorithm, in which the summary statistics from the subsystems are combined by the confidence distribution. Zhou explains: “This is a robust and efficient meta-method for integrating the results. More importantly, we studied the relationship between the number of machines and the sample size. We found that the requirement for the number of machines is a trade-off between statistical accuracy and computational efficiency.”

Zhou adds: “We believe the algorithm we have developed can significantly help to address the computational challenges arising from large-scale data.”

Japanese astrophysicist's supercomputer simulations of plasma jets reveal magnetic fields far, far away

Radio telescope images enable a new way to study magnetic fields in galaxy clusters millions of light-years away

For the first time, researchers have observed plasma jets interacting with magnetic fields in a massive galaxy cluster 600 million light-years away, thanks to the help of radio telescopes and supercomputer simulations. The findings can help clarify how such galaxy clusters evolve.

Galaxy clusters can contain up to thousands of galaxies bound together by gravity. Abell 3376 is a huge cluster forming as a result of a violent collision between two sub-clusters of galaxies. Very little is known about the magnetic fields that exist within this and similar galaxy clusters.

"It is generally difficult to directly examine the structure of intracluster magnetic fields," says Nagoya University astrophysicist Tsutomu Takeuchi, who was involved in the research. "Our results clearly demonstrate how long-wavelength radio observations can help explore this interaction." A black hole (marked by the red x) at the centre of galaxy MRC 0600-399 emits a jet of particles that bends into a "double-scythe" T-shape that follows the magnetic field lines at the galaxy subcluster's boundary.

An international team of scientists has been using the MeerKAT radio telescope in the Northern Cape of South Africa to learn more about Abell 3376's huge magnetic fields. One of the telescope's very high-resolution images revealed something unexpected: plasma jets emitted by a supermassive black hole in the cluster bend to form a unique T-shape as they extend outwards for distances as far as 326,156 light-years away. The black hole is in galaxy MRC 0600-399, which is near the centre of Abell 3376.

The team combined their MeerKAT radio telescope data with X-ray data from the European Space Agency's space telescope XXM-Newton to find that the plasma jet bend occurs at the boundary of the subcluster in which MRC 0600-399 exists.

"This told us that the plasma jets from MRC 0600-399 were interacting with something in the heated gas, called the intracluster medium, that exists between the galaxies within Abell 3376," explains Takeuchi.

To figure out what was happening, the team conducted 3D 'magnetohydrodynamic' simulations using one of the world's most powerful supercomputer, ATERUI II, located at the National Astronomical Observatory of Japan.

The simulations showed that the jet streams emitted by MRC 0600-399's black hole eventually reach and interact with magnetic fields at the border of the galaxy subcluster. The jet stream compresses the magnetic field lines and moves along them, forming the characteristic T-shape.

"This is the first discovery of an interaction between cluster galaxy plasma jets and intracluster magnetic fields," says Takeuchi.

An international team has just begun construction of what is planned to be the world's largest radio telescope, called the Square Kilometre Array (SKA).

"New facilities like the SKA are expected to reveal the roles and origins of cosmic magnetism and even to help us understand how the universe evolved," says Takeuchi. "Our study is a good example of the power of radio observation, one of the last frontiers in astronomy."