NCSA's Lincoln supercomputer from Dell outshines Cray systems in MD simulations

The graphics processing units in NCSA's Lincoln cluster speed molecular dynamics simulations that drive the development of detergents and drug-delivery systems.

A few years ago, a graphics processing unit had one job: Thrill videogamers by throwing as many pixels up on the screen as fast as possible. Let them see through their sniper sight in more detail. Reduce the lag between when they press the key on the controller and when their swords clash.

Seeing the processors' power and blazing speeds, researchers have begun moving some scientific work to these processors, which are often called GPUs. In fact, after years of working with scientists on experimental GPU-based systems, NCSA launched the Lincoln supercomputer in 2008. Lincoln includes NVIDIA GPU units, as well as traditional Intel CPUs. GPUs, in other words, have moved into the supercomputing game.

'You use them all the time'

A team from Temple University is already harnessing those GPUs in their daily research. At Temple's Institute for Computational Molecular Science, they use the Lincoln cluster to model surfactants. Surfactants are used in common household products like detergents and shampoo. Researchers are also exploring another class of surfactants as a way of controlling the delivery of drugs in the body and improving their impact.

"These are molecules that you use all the time, but you don't see them," says Axel Kohlmeyer, the institute's associate director. "You can use them to change the properties of liquids. In practice, that means if you have something dirty on your clothes, the surfactant can attach to it and mix with water, so you can wash it away."

By modeling surfactants on a computer, researchers can also wash away expensive and slow laboratory work as they design new products.

But mixtures of surfactants, water, and other molecules are exceptionally challenging to model. They aggregate themselves into micelles, vesicles, and other structures that can trap materials. The self-assembly process takes place at the micrometer scale over the course of hundreds of nanoseconds, and it often includes hundreds of millions of atoms.

To capture this complicated and long (relatively speaking) activity, the team relies on a clever approach and a powerful simulation code.

'Just a fun summer project'

The approach is known as coarse-grain molecular dynamics. With this strategy, molecules fragments are modeled as spherical "pseudo-particles," dramatically reducing the number of particles and thus the number of particle interactions that must be computed. Three water molecules, for example, might be represented by a single sphere in coarse-grain calculations.

"A natural way to model it is to break a molecule into pieces by their chemical properties, and then sum up the interactions of the fragments into one pseudo-particle each," Kohlmeyer says. Some details are lost, but how the molecules self-assemble and how they interact with other molecules can still be determined with high fidelity.

The team from Temple published results using this approach in the Journal of Chemical Theory and Computation in 2009.

Recently, the group began testing a new simulation code called HOOMD-blue. HOOMD, for short, is a molecular dynamics code written for use on GPUs. It was created in 2007 by the University of Michigan's Joshua Anderson while he was a graduate student at Iowa State University. About a dozen developers around the country, including Anderson, now contribute to the open-source application.

"When I started out, it was just a fun summer project," Anderson says. A new software development kit for GPUs, called CUDA, had just been released by NVIDIA, "and I just wanted to play with it."

Within a couple of months, he had code running 30 times faster on a single GPU than it did on a single traditional processor. Since then, algorithm improvements and a new generation of GPUs have sent that performance number to more than 60 times faster. That "new" generation of GPUs is already more than a year old and the announcement for another performance-doubling hardware generation is expected soon.

A scripting language in HOOMD makes it very extensible and adaptable. In fact, it allowed the Temple team to easily create a parallel version of the code and run on multiple processors simultaneously.

"I didn't envision taking [parallelism] to the level that Axel has done, but I wanted that functionality to be available," Anderson says.

"HOOMD uses many modern techniques of portable and flexible programming, yet keeps it simple and easy enough for people to add functionality quickly and consistently. I wish more scientific software projects would spend that much effort on these fundamental issues. My life would be much easier," Kohlmeyer adds.

'It still blows my mind'

Validation runs of the coarse-grain method on NCSA's Lincoln supercomputer have shown tremendous speedups.

"The outcome is quite spectacular ... With two GPUs we can run a single simulation as fast as on 128 CPUs of a Cray XT3," Kohlmeyer says. And with HOOMD, they have a straightforward way of running hundreds of those simulations in tandem.

What's the Temple team doing with this approach and computing power? Getting real science done, and gearing up for even bigger simulations.

Using NCSA's Abe supercomputer and partnering with a team of experimental chemists at the University of Pennsylvania, they're simulating self-assembling dendrimeric molecules that can be tailored to particular shapes and properties. Ultimately, these molecules could be used to build customized "containers" for drugs. Self-assembled capsules would be built around the drug, which might otherwise be destroyed as it made its way through the body.

The team plans to move those simulations to Lincoln soon.

"We will be able to screen potential modifications to the individual molecules on the computer and save people tons of hours and money in the lab," Kohlmeyer says. "A machine like Lincoln is perfect for that, as one would need to run many variations at the same time."

"We can try things that were undoable before. It still blows my mind."

Figures 1-3. Self-assembly of a bilayer structure of dendrimeric amphiphiles in water. This simulation demonstrates how the structure of the monomeric molecules can determine the final configuration of the self-assembled structure. The initial configuration was set up with an even distribution of monomers and water. This whole process happens in only 20 nanoseconds of coarse-grain molecular dynamics.

 

 

 

 

 

 

 

 

 

 

 

 

 

The monomers quickly assemble into "hands."

Self-assembly of a bilayer structure of dendrimeric amphiphiles in water: The monomers combine into a larger three-dimensional structure.
The monomers combine into a larger three-dimensional structure.

Self-assembly of a bilayer structure of dendrimeric amphiphiles in water: The monomers rearrange into a flat bilayer.
The monomers rearrange into a flat bilayer.


Not just for graphics anymore
Graphics processing units (GPUs) aren't just for graphics anymore. These high-performance "many-core" processors are increasingly being used to accelerate a wide range of science and engineering applications, in many cases offering dramatically increased performance compared to CPUs.

But many questions surround the use of GPUs. Here are answers to some of the most common ones. You can learn more about GPUs and the GPU resources offered by NCSA and the Institute for Advanced Computing Applications and Technologies at www.ncsa.illinois.edu.

How is a GPU different from a CPU?
CPUs have few (two to eight) large, complex cores, while GPUs have up to a few hundred small, simple cores. Unlike CPUs, GPU cores are not general purpose. They are focused solely on computation, with little support for I/O devices, interrupts, and complex assembly instructions.

What advantages do GPUs offer?
GPUs can deliver up to a teraflop (1 trillion calculations per second) of computing power from the same silicon area as a comparable microprocessor using a small fraction of the power per calculation. That means high performance in a smaller footprint, for a lower cost, and consuming less power.

So are CPUs obsolete?
No, because GPUs still require CPUs to access data from disk, to exchange data between compute nodes in a multi-node cluster, and other tasks. CPUs are very good at executing serial tasks and every application has those. And as more and more cores are combined on a single chip, CPUs are becoming parallel as well.

What is the downside of using GPUs?
GPUs are not right for every application. Only applications that have a substantial amount of parallelism can benefit from GPUs. GPUs also require a fresh approach to programming.

This project is funded by the Department of Energy, the National Science Foundation, the National Institutes of Health, and Procter & Gamble.

Team members
Russell H. DeVane
David LeBard
Benjamin Levine
Axel Kohlmeyer
Wataru Shinoda
Christopher J. Wilson
Joshua Anderson
Alex Travesset
Rastko Sknepnek
Carolyn Phillips
Trung Dac Nguyen
Michael Klein