NCSA answers questions about GPUs

Graphics processing units (GPUs) aren't just for graphics anymore. These high-performance "many-core" processors are increasingly being used to accelerate a wide range of science and engineering applications, in many cases offering dramatically increased performance compared to CPUs.

Significant biomolecular, computational chemistry, astrophysical, condensed matter physics, weather modeling and seismic stack migration applications already have benefited substantially from or show substantial promise for using GPUs.

But many questions surround the use of GPUs. Here IACAT and NCSA staff who work with GPUs provide some answers.

How is a GPU different from a CPU?
CPUs have few (two to eight) large, complex cores, while GPUs have up to a few hundred small, simple cores. Unlike CPUs, GPU cores are not general purpose. They are focused solely on computation, with little support for I/O devices, interrupts, and complex assembly instructions.

What advantages do GPUs offer?
GPUs can deliver up to a teraflop (1 trillion calculations per second) of computing power from the same silicon area as a comparable microprocessor using a small fraction of the power per calculation. That means high performance in a smaller footprint, for a lower cost, and consuming less power.

So are CPUs obsolete?
No, because GPUs still require CPUs to access data from disk, to exchange data between compute nodes in a multi-node cluster, etc. CPUs are very good at executing serial tasks and every application has those. And as more and more cores are combined on a single chip, CPUs are becoming parallel as well.

What is the downside of using GPUs?
GPUs are not right for every application. Only applications that have a substantial amount of parallelism can benefit from GPUs. GPUs also require a fresh approach to programming. The programming model used on GPUs is different from the conventional serial processor programming model.

Can "personal supercomputers" made up of a small number of high-performance GPUs replace the big clusters at NCSA and other supercomputing centers?
A desktop system with four GPUs can provide significant computing capability for researchers who have a code, like NAMD, that can take advantage of these processors. But when the time comes to scale up to larger simulations or to explore a larger parameter space, hundreds or thousands of desktop-size runs could be required, stretching the time to solution on a "personal supercomputer" longer and longer ... perhaps to infinity!

In these cases, a larger resource, 'Lincoln', a Dell supercomputing cluster at NCSA, is called for. Lincoln enables hundreds of tightly coupled "desktop-size" jobs to run, or allows a relatively wide (100-node) parallel job to run with ease.

How do I adapt my application to use GPUs?
There are two approaches to porting applications to GPUs. The first views GPUs as accelerators, offloading computationally demanding sections of an application while the bulk of the application remains on the CPUs. This incremental approach minimizes the effort required to get started with GPU programming, but it faces issues with memory and bus bandwidth limitations. And the formerly trivial components remaining on the CPU become the dominant bottlenecks.

The second approach is to rewrite the code to run entirely on the GPU, using the CPU only when necessary (in this case, the CPU can be thought of as a decelerator). This approach requires a data-centric program design, rather than traditional compute-centric methods. The system main memory becomes a cache for the GPU, and data is transferred only if and when it is needed. Optimal use of this method decomposes the computational task so multiple accelerators can work independently of each other; this allows the GPUs to process substantial amounts of data.

Where can I learn more about GPU programming?
Check out these online training materials offered by NCSA and IACAT:

What GPU resources do NCSA and IACAT offer?
The Institute for Advanced Computing Applications and Technologies (IACAT) offers a 32-node cluster that combines both GPU and FPGA (field-programmable gate array) technology.

The cluster is used for Electrical and Computer Engineering courses, for training workshops, as a platform to experiment with GPU clusters and develop accelerated cluster management software, and as a resource for science and engineering researchers to explore the potential of these architectures to accelerate scientific computing. For information on accessing IACAT's cluster, contact Mike Showerman at mshow@ncsa.illinois.edu.

NCSA has a 47 teraflop cluster, called Lincoln, that combines multi-core CPUs and GPUs. Lincoln is a TeraGrid resource. For information on applying for an allocation of time on Lincoln, go to: http://www.teragrid.org/userinfo/access/allocations.php.

What tools are deployed on these GPU clusters?
Both IACAT's GPU cluster and Lincoln have NVIDIA's CUDA C and Portland Group's PGI x64+GPU compilers installed. The IACAT GPU cluster also offers NVIDIA's OpenCL development suite and home-grown utilities for cluster management. NCSA's Innovative Systems Laboratory works closely with technology vendors to test the latest tools and technologies.

What comes next?
NCSA and IACAT continue to experiment with novel cluster node configurations. We are eagerly awaiting the availability of the next generation of NVIDIA products to build the next-generation GPU cluster.

Contributors:

  • Jay Alameda, NCSA senior technical program manager
  • Volodymyr Kindratenko, NCSA senior research scientist
  • Craig Steffen, NCSA senior research scientist

For more information: