Japan’s Next-Generation Supercomputer Configuration Is Decided

Architecture based on scalar system

  • Supercomputer will boast a performance of 10 petaflops upon completion in 2012, as initially planned
  • Scalar system will be built from the world's fastest CPUs (128 gigaflops)
  • CPUs will feature error-recovery function; network will have excellent fault tolerance and operability

Fujitsu and Japan's Institute of Physical and Chemical Research, RIKEN, have announced that RIKEN has decided to employ a new system configuration with a scalar processing architecture for its next-generation supercomputer. The supercomputer is being sponsored by Japan's Ministry of Education, Culture, Sports, Science, and Technology (MEXT) as part of its project for the "Development and Use of an Advanced, High-Performance, General-Purpose Supercomputer" (Next-Generation Supercomputer Project).

The project is a joint effort between RIKEN and Fujitsu aimed at developing a supercomputer capable of achieving a performance of 10 petaflops. As originally planned, it is on track for completion in 2012.

With RIKEN taking the lead role in development, conceptual design planning for the next-generation supercomputer began in September 2006. In 2007, RIKEN completed this phase, and following design evaluation, it began the system development phase. The initial plan called for a hybrid system featuring both scalar units and vector units, but NEC Corporation, which was in charge of developing the vector unit, recently notified RIKEN that it would be unable to participate in the project's production stage, which effectively ended the plans for a hybrid system.

Meanwhile, MEXT's interim assessment group conducted a technical assessment of the project's system architecture. Based on MEXT's assessment, RIKEN performed its own review of the architecture and, following a report to MEXT and additional assessments, has moved forward with a decision on the new system configuration.

RIKEN decided to pursue a scalar configuration after assessing the prospect of the project, aiming to reach its original goals of achieving a LINPACK performance of 10 petaflops and being completed and ready for shared use in 2012. The next-generation supercomputer will utilize what is presently the world's fastest CPU (128 gigaflops), developed and manufactured by Fujitsu using 45-nm process technology. The network between the system's nodes will consist of a direct-connection network with wide-band communications channels. This system configuration will ensure both energy efficiency and massive parallel computing capability. In addition, RIKEN plans to collaborate with organizations in charge of promoting the supercomputer utilization, in order to provide users of vector units with sufficient support.

1. Background

In fiscal 2006, MEXT began promoting the development of a world-class supercomputer as part of its Next-Generation Supercomputer Project. RIKEN organized a next-generation supercomputer development division to lead development, with the aim of achieving the world's highest performance.

In April 2007, RIKEN adopted a plan for a hybrid system architecture consisting of scalar and vector units and carried out the development in accordance with the plan. On May 13, 2009, however, NEC expressed its intention to withdraw from the project following the detailed design phase, stepping out of the subsequent prototyping and production stages. This effectively ended plans for a hybrid system.

At the same time, MEXT conducted an interim assessment of the Next-Generation Supercomputer Project and evaluated the technical aspects of the system architecture. Based on the result, RIKEN reviewed the system architecture and, following a report to MEXT and additional assessments, committed to a new scalar configuration.

2. System overview

The new architecture is for a scalar supercomputer, utilizing a distributed-memory parallel computing system. Configuration details are described below.

  1. CPU
    The system will adopt Fujitsu's SPARC64™ VIIIfx CPU (8 cores, 128 gigaflops), which is manufactured using the company's 45-nm process technology. As the world's highest-performance general-purpose CPU, the processor offers both performance and energy efficiency, achieving a computational speed of 128 gigaflops per CPU. The inclusion of an error-recovery function(7) also enhances its operability.
  2. Network
    The network linking the computing nodes is a direct-connection network optimized for the supercomputer's architecture. Typical direct-connection networks are known to excel in allowing for system flexibility and expandability, but they have drawbacks in terms of the system's fault tolerance and operability. The network developed for this system, on the contrary, uses a multidimensional mesh/torus connection that achieves excellent fault tolerance and operability along with flexibility and expandability.
    Additionally, the network has a wide-band communications channel between neighboring nodes, and from the programming standpoint, the network can be virtually configured as a torus network with up to three dimensions. This makes it possible for users to efficiently run applications that use adjacent data, which are frequently found in scientific and technical computing.
  3. System software and other features
    A system software suite is simultaneously being developed to make full use of the system's network performance and CPUs with an error-recovery function. In particular, Linux was chosen as the supercomputer's operating system, equipped with standard programming languages (compilers), and communications libraries.
    The reliability of each hardware component needs to be maximized to ensure the stability of the ultra-large-scale system, so that a single component failure does not impact the entire system's operation. The CPU's error-recovery function and a network with excellent fault tolerance and operability will allow for the next-generation supercomputer to be reliably available to researchers and engineers throughout Japan.

3. Development schedule

Following the prototype and assessment period, installation of the next-generation supercomputer will begin in fiscal 2010. It should be partially operational by the end of fiscal 2010 and complete and ready for shared use by 2012.

For More Information RIKEN: http://www.riken.jp/engn/index.html

Supplemental Information

Project for "Development and Use of an Advanced, High-Performance, General-Purpose Supercomputer"

A project with the goals of developing and disseminating technologies for developing, building and using the world's most advanced and powerful next-generation supercomputer. In order to further advance the technology of computational sciences, which is playing an increasing central role in current scientific methods as well as scientific theory and experimentation, the project's goal is to bring online in fiscal 2010 (with completion in 2012) a next-generation supercomputer, which is viewed as a key technology of Japan's long-term national strategy and a core national technology. With the aim of maintaining Japan's leadership role in the fields of science and technology, academic research, industry, medicine, and pharmaceuticals, MEXT launched an initiative with the following goals:

  1. Develop and build the world's most advanced and powerful next-generation supercomputer (10-petaflop class).
  2. Develop and disseminate software that makes full use of the next-generation supercomputer.
  3. Construct a world-leading supercomputing research and education center focused on achieving goal number one.

RIKEN is leading the development effort, as part of the close collaboration between industry, government and academia in Japan.

Next-Generation Supercomputer Facility (conceptual drawing)

Next-Generation Supercomputer Installed in Lab (conceptual image)

(2) Scalar units, vector units

A scalar unit is a CPU that operates on data in small chunks in sequential order. This type of computing is well-suited to computations involving complex data access, such as those involved in structural analyses of nano-scale devices as well as analyses of gene and protein data. A vector unit is a CPU that operates on large units of data continuously. This type of computing is suited to global-scale circulation analyses of the atmosphere or oceans, airplane or automobile aerodynamics, and other applications that involve continuous data processing.

Although the next-generation supercomputer will consist only of scalar units, through the use of application parallelization and tuning it will support applications that have run on previous supercomputers with vector units. Other ways to assist users of vector-based supercomputers are also being considered.

(3) 45-nm process technology

45-nm process technology is an indication of the degree of miniaturization achieved in forming circuits on the surface of a semiconducting wafer.

Fujitsu SPARC64™ VIIIfx microprocessor

(4) Direct-connection network, torus network

There exist two kinds of networks: direct-connection and indirect-connection. In a direct-connection network, the entire network consists of numerous connections between pairs of nodes. In an indirect-connection network, a switch sits between multiple nodes. A three-dimensional torus network is a kind of direct-connection network where the nodes are organized into a three-dimensional structure, and each is linked to six others, forming a ring structure on each dimension.

Direct-Connection Network, 3D Torus Network

(5) Organizations in charge of promoting the supercomputer utilization

Organizations that support and select users for next-generation supercomputer facilities.

(6) Distributed-memory parallel computing system

A name used for a type of supercomputer based on its memory architecture, where a given node cannot directly access a different node’s memory. Given that the memory cannot be directly accessed, data transmission is necessary in advance. For these communications to work, the typical distributed-memory parallel computing system requires a high-performance network.

(7) Error-recovery function

The CPU has functions for detecting and correcting erroneous data and instruction retry, if a fault occurs.