High-performance computing in the PCAST Report

The President's Committee of Advisors on Science and Technology is charged to, among other things, review federal investment in research and development for networking and information technology (NIT). The recent report, "Designing a Digital Future: Federally Funded Research and Development in Networking and Information Technology," argues that NIT is critical to a wide range of national priorities and urges broad and sustained support for research. This is a long and detailed report, with over 120 pages, and it has a lot of good stuff in it (here's the time for the disclaimer—I was a member of the working committee that helped draft the report). I'd like to focus on the recommendations related to high-performance computing (HPC).

The report recognizes the key role that HPC has played and continues to play in many of the nation's priorities, including health, energy, and national security. In fact, computing has become so important to a broad range of activities that many federal agencies make extensive use of computing (not just HPC) to perform their mission. Just how important is this? One way is to follow the money.

The total federal expenditure in NIT R&D is about $4 billion. A significant fraction of this is in a category labeled "HPC R&D." This makes it look like a significant fraction of the nation's investment in information technology research is going into HPC. But these figures are misleading. Because of the often opaque way that federal expenditures are reported, the cost of the computing infrastructure used in performing research in areas like medicine, astrophysics, and materials science was included in the NITRD budget figures, much under a category called HPC Research and Development. This greatly overstates the amount of investment in NIT research and development by conflating research that relies on the use of high-performance computers with research into improving high-performance computing but it does underscore the key role of computing in many fields. The report calls for the budget figures to make clear what is being spent on NIT R&D and what is being spent on the infrastructure to make use of computing. However, the size of the funding devoted to the use of computing in general and HPC in particular is strong evidence of the continued importance of HPC.

We are always looking for a way to identify the best, strongest, and fastest. HPC is no different, and HPC systems have historically been ranked by a single benchmark; the results are posted twice a year as the Top500 list, which is usually dominated by the United States. A Chinese system placed at the top of the most recent list in November 2010, raising questions about whether U.S. leadership in HPC had come to an end. However, being on the top of this list does not mean that the system is the most effective for solving a wide range of problems. The benchmark used for the Top500 performs a computation that stresses just one part of a system—the rate at which floating-point operations can be performed. This focus on floating-point operations neglects system characteristics that are often more important for applications, such as the rate at which data can be moved within the system. While the HPC community has long known that no single benchmark adequately captures the usefulness of a system, the PCAST report explicitly calls for a greater focus on what I'll call sustained performance: the ability to compute effectively on a wide range of problems:

"But the goal of our investment in HPC should be to solve computational problems that address our current national priorities,"

Addressing this is becoming critical, because developing systems based solely to rank at the top of the Top500 list will not provide the computational tools needed for productive science and engineering research.

The last few decades have seen both a tremendous expansion in computing capability along with an unparalleled stability in computing methods. Applications that were written 10 years ago can run with little or no change, even on some of the largest parallel systems. This business-as-usual approach to high-end computing, though astoundingly effective for several decades, is nearing its end. It isn't just one barrier to further progress, but many. While the end of Moore's law and the rapid gains in performance that have gone with it has been predicted in the past, this time it may be for real (at least in terms of performance). In the past, as one barrier to progress appeared, we were able to make some tradeoff to continue progress. For example, until recently, processor clock rates kept increasing but this required the acceptance of ever larger power requirements, to the point where the processors in a laptop can cause burns and supercomputers require 10 Megawatts or more to operate. But we are running out of tradeoffs to make. Building the next few generations of high-end machines using today's technology or even the natural evolution of today's technologies will not be effective—in addition to the technical challenges, systems designed along the lines of current systems will be very costly, with the risk that paying for these systems will not leave enough resources with which to create the next computing revolution.

To avoid this fate, the report calls for "substantial and sustained" investment in a broad range of basic research for HPC, specifically:

"To lay the groundwork for such systems, we will need to undertake a substantial and sustained program of fundamental research on hardware, architectures, algorithms and software with the potential for enabling game-changing advances in high-performance computing."

The list of topics, all areas in which too little is known and in which there is currently too little research, is quite sobering. Breakthroughs in only a few of these would transform computing. The NSF, DARPA, and DOE are taking the first steps to address these, but they need to be able to do more. And the community, in particular the HPC community, needs to be willing to take more risks. The past two decades have seen significant stability in HPC; even with a factor of 10,000 or more increase in scale, the programming models and many of the algorithms have only slowly changed. This is the time to rethink all aspects of computing—the hardware, the software, and the algorithms. Without a sustained investment in basic research into HPC, the historic increase in performance of HPC systems will slow down and eventually end. With such an investment, HPC will continue to provide scientists and engineers with the ability to solve the myriad of challenges that we face.

william-gropp-portrait

William Gropp is the Paul and Cynthia Saylor Professor of Computer Science, a principal investigator on theBlue Waters sustained-petaflop supercomputer, and the deputy director for research of the Institute for Advanced Computing Applications and Technologies.