HEALTH
ClearSpeed Breaks GigaFLOP per Watt Performance Barrier for Supercomputing
Acceleration technology sets new energy efficiency standards for Linpack benchmark -- ClearSpeed Technology, the leader in double precision coprocessor acceleration technology, today announced Linpack benchmark results that set new standards for energy efficient computation for high performance computing (HPC) clusters. ClearSpeed Advance accelerator boards rated at only 25 Watts power consumption per board added 28.5 GigaFLOPS (GFLOPS) each to a cluster of Hewlett Packard Proliant DL380 G5 servers running the high performance Linpack benchmark. With two Advance accelerator boards in each of the four servers, the cluster performance was increased to over 364 GFLOPS while adding only 200 Watts to the overall power levels. Without ClearSpeed acceleration, the four node cluster delivered 136 GFLOPS from its 8 Intel Xeon 5160 (Woodcrest) dual core processors while consuming 1,940 Watts of power. A similarly configured single node delivered 90 GFLOPS compared with 34 GFLOPS for the non-accelerated system. The ClearSpeed accelerated cluster completed the Linpack benchmark run in just 18.4 minutes while using only 40% of the energy required by the non-accelerated cluster which took 48.4 minutes to finish. “Consuming no more power than it takes to turn on the lights in a normal living room, we have increased the performance of the cluster more than two and a half times,” said John Gustafson, ClearSpeed chief technical officer for HPC. “With the additional Linpack performance exceeding one GFLOP per Watt and almost perfect scaling, we have demonstrated that ClearSpeed accelerator technology can combine unmatched performance with the economic benefits of reduced energy consumption for HPC clusters.” To put these results in context, the performance delivered by the ClearSpeed accelerated four node HP cluster (a total of 16 CPU cores), is equivalent to the number one Top500 (www.top500.org) installation from November 1996 which was a massive 2048 CPU Hitachi system at the Center for Computational Science at the University of Tsukuba in Japan that delivered 368.2 GFLOPS. Even more impressively, the test cluster is contained in a half populated 14U rack and can operate in a standard office environment. Benchmark Results System Specification Linpack Result (GFLOPS) Elapsed Time 4 nodes (16GB) w/o Advance boards 136.0 48.4 minutes 4 nodes (16GB) w/ 2 x Advance boards each 364.2 18.4 minutes 1 nodes (16GB) w/o Advance boards 34.0 1 nodes (16GB) w/ 2 x Advance boards 90.1 Note: Previously published Linpack results for similar single node systems were 34.9 GFLOPS for the standard node and 93 GFLOPS for an accelerated node with two ClearSpeed Advance boards. The variations are a result of small differences between system configurations and problem sizes used during the benchmark runs. Top500 Results from November 1996 Rank Site System Processors Rmax Rpeak 1 Center for Computational Science CP-PACS/2048 2048 368.2 614.4 University of Tsukuba, Japan Hitachi 2 National Aerospace Laboratory Numerical 167 229 281.26 Japan Wind Tunnel Fujitsu 3 University of Tokyo, Japan SR2201/1024 1024 220.4 307 Hitachi 4 Sandia National Laboratories XP/S140 3680 143.4 184 United States Intel 5 Oak Ridge National Laboratory XP/S-MP 150 3072 127.1 154 Specifications of benchmark system supplied by Hewlett Packard and tested by ClearSpeed Technology Four HP ProLiant DL380 G5 servers, each with:
- Two 3.0 GHz dual core Intel Xeon 5160 processors
- 16 GB fully buffered DIMM Memory - Embedded NC373i Multifunction Gigabit Network Adapter
- 1000 Watt Hot-Plug Power Supply
- Two ClearSpeed Advance accelerator boards
The four servers were connected with an HP Procurve 2824 Switch The Linpack Benchmark and the Top500
The Linpack Benchmark was introduced by Jack Dongarra. A detailed description as well as a list of performance results on a wide variety of machines is available in postscript form from netlib. A parallel implementation of the Linpack benchmark and instructions on how to run it can be found at its Web site. The benchmark used in the Linpack Benchmark is to solve a dense system of linear equations. For the Top500, a version of the benchmark is used that allows the user to scale the size of the problem and to optimize the software in order to achieve the best performance for a given machine. This performance does not reflect the overall performance of a given system, as no single number ever can. It does, however, reflect the performance of a dedicated system for solving a dense system of linear equations. Since the problem is very regular, the performance achieved is quite high, and the performance numbers give a good correction of peak performance. Top500 Description
The Top500 table shows the 500 most powerful commercially available computer systems known. To keep the list as compact as possible, only a part of the data evaluated is shown on the website including: * Nworld - Position within the Top500 ranking * Manufacturer - Manufacturer or vendor * Computer - Type indicated by manufacturer or vendor * Installation Site - Customer * Location - Location and country * Year - Year of installation/last major update * Field of Application * #Proc. - Number of processors * Rmax - Maximal Linpack performance achieved * Rpeak - Theoretical peak performance * Nmax - Problem size for achieving Rmax Information about the Top500 can be found at its Website.