Berkeley Lab Team Wins Best Paper at CloudCom


Cloud computing has proven to be a cost-efficient model for many commercial web applications, but will it work for scientific computing? Not unless the cloud is optimized for it, writes a team from the Lawrence Berkeley National Laboratory.

After running a series of benchmarks designed to represent a typical midrange scientific workload—applications that use less than 1,000 cores—on Amazon's EC2 system, the researchers found that the EC2's interconnect severely limits performance and causes significant variability. Overall, the cloud ran six times slower than a typical mid-range Linux cluster, and 20 times slower than a modern high performance computing system.

The team's paper, "Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud," was honored with the Best Paper Award at the IEEE's International Conference on Cloud Computing Technology and Science (CloudCom 2010) held Nov. 30-Dec.1 in Bloomington, Ind.

"We saw that the communication pattern of the application can impact performance, Applications like PARATEC with significant global communication perform relatively worse than those with less global communication," says Keith Jackson, a computer scientist in the Berkeley Lab’s Computational Research Division (CRD) and lead author of the paper.

He also notes that the EC2 cloud performance varied significantly for scientific applications because of the shared nature of the virtualized environment, the network, and differences in the underlying non-virtualized hardware.

The benchmarks and performance monitoring software used in this research were adapted from the large-scale codes used in the National Energy Research Scientific Computing Center's (NERSC) procurement process. NERSC is located at the Berkeley Lab and serves approximately 4,000 Department of Energy (DOE) supported researchers annually in disciplines ranging from cosmology and climate to chemistry and nanoscience.In this study, the researchers essentially cut these benchmarks down to midrange size before running them on the Amazon cloud.

"This set of applications was carefully selected to cover both diversity of science areas and the diversity of algorithms," said John Shalf, who leads NERSC’s Advanced Technologies Group."They provide us with a much more accurate view of the true usefulness of a computing system than ‘peak flops’ measured under ideal computing conditions." 

The benchmark modifications and performance analysis in this research were done in collaboration with the DOE’s Magellan project, funded by the American Recovery and Reinvestment Act."The purpose of the Magellan Project is to understand how cloud computing may be used to address the computing needs for the Department of Energy's Office of Science.  Understanding how our applications run in these environments is a critical piece of the equation," says Shane Canon, who leads the Technology Integration Group at NERSC.

In addition to Canon, Jackson and Shalf, Berkeley Lab's Lavanya Ramakrishnan, Krishna Muriki, Shreyas Cholia, Harvey Wasserman and Nicholas Wright are also authors on the paper.

"This was a real collaborative effort between researchers in Berkeley Lab's CRD, Information Technologies and NERSC divisions, with generous support from colleagues at UC Berkeley—it is a great honor to be recognized by our global peers with a Best Paper Award," adds Jackson.

The award is the second such honor for Jackson and Ramakrishnan this year. Along with Berkeley Lab colleagues Karl Runge of the Physics Division and Rollin Thomas of the Computational Cosmology Center, they won the Best Paper Award at the Association for Computing Machinery’s ScienceCloud 2010 workshop for"Seeking Supernovae in the Clouds: A Performance Study."

The Department of Energy's Office of Advanced Scientific Computing Research and the National Science Foundation funded the work; and CITRIS at the University of California, Berkeley donated Amazon EC2 time.

Read the paper here.