Cray CTO Steve Scott Discusses Adaptive Supercomputing

In this article written for Supercomputing Online, Steve Scott, Chief Technology Officer of Cray, describes "adaptive supercomputing," the company's visionary approach to building future supercomputer hardware and software systems. Adaptive Supercomputing is Cray's vision and plan that builds on our historic strengths in order to address applications requirements that are increasingly diverse and challenging. In Adaptive Supercomputing, the system automatically adapts to the applications, rather than requiring the programmer to adapt the applications to the system, as happens with today's HPC systems. Cray will implement Adaptive Supercomputing in phases starting in 2007. Our trans-petaflop Cascade system, targeted for the DARPA HPCS program in the 2010 timeframe, is a key component of the vision. While standard microprocessors provide excellent performance on a wide variety of applications, they are not the best suited for all problems. There is a growing realization that, when it comes to high performance computing, one size does not fit all. What is needed is a single system architecture that can apply multiple processor technologies to heterogeneous applications and diverse workloads. This architecture should automate the process of selecting the best processor type for each application (or segment of an application). It should be balanced to deliver high performance at large scale, and it should be substantially easier to program and more robust than today's HPC systems. This is the goal of Cray’s Adaptive Supercomputing vision. Applications in the HPC space are growing in complexity. Users want to be able to work with larger datasets, higher resolution, more variables, and increased integration and communication between applications. More and more HPC users want to tackle multi-scale, multi-physics problems. The Earth Sciences community, for example, is already working with coupled models that include an increasing number of heterogeneous applications with dramatically different spatial and time scales. Today, interoperability requires that these applications run on a single processor technology (scalar or vector) that is not optimal for all of them. An analogous situation exists in the CAE community, where most applications were developed and optimized to focus on a single discipline (CFD, CSM, CEM, acoustics, etc.). For competitive advantage, automakers and other CAE users are looking toward coupling these applications to conduct full-vehicle simulations that concurrently optimize for emissions, safety, comfort and other factors. As we announced in November 2005, Cray will continue to use AMD Opteron processors for the microprocessor-based supercomputer products we develop, at least through the end of this decade. The two firms are also actively collaborating on Cray's Phase 3 proposal for the HPCS program. Our Adaptive Supercomputing systems will provide heterogeneous computing by augmenting the AMD Opteron processors with custom processing technologies that are able to extract more performance out of the transistors on a chip with less control overhead. Vector processing and field programmable gate arrays (FPGAs) are two promising technologies to do this. Both of these technologies allow higher processor performance, with lower power, on a subset of HPC applications, and reduce the number of processors required to solve a given problem. No matter what processors are used in an HPC system, the most crucial consideration is balance. It is far too easy to build an unbalanced machine, which may perform well on embarrassingly parallel applications or achieve high Linpack numbers, but fail to perform well on challenging applications or diverse workloads. A balanced hardware design is important not just for scalable performance, but also to improve programmability and breadth of applicability. Balanced systems also require fewer processors to scale to a given level of performance, reducing failure rates and administrative overhead. Cray has a strong history of designing balanced supercomputer systems, with excellent bandwidth and latency characteristics, and our Adaptive Supercomputing systems will continue this tradition. Perhaps the biggest drain on productivity is the time spent trying to structure an application to fit the attributes of the target machine. If the machine is a cluster with limited interconnect bandwidth, for example, the programmer must carefully minimize communication, and make sure that any sparse data to be communicated is first bundled together into larger messages to reduce communication overheads. If the machine uses conventional microprocessors, care must be taken to maximize cache re-use and eliminate global memory references, which tend to stall the processor. In other words, if your machine looks like a hammer, then you’d better make all your codes look like nails! This can lead to “unnatural” algorithms and data structures, which significantly reduces programmer productivity. We will provide several high-productivity programming models that significantly ease the burden of parallel programming, and innovative debugging and performance tuning tools specifically designed for very large-scale computing. We will of course support MPI for legacy purposes, but we will supplement MPI with several alternatives that are facilitated by the globally addressable memory. Unified Parallel C (UPC) and Co-Array Fortran (CAF) programs are simpler and easier to write than their MPI counterparts, and look much closer to the serial code that most programmers are familiar with. We’re also developing the Global View programming model that takes these advantages another giant step forward. Global View programs present a single, global view of the program’s data structures, and begin with a single main thread. Parallel execution then spreads out dynamically as work becomes available. A typical reaction to a program written in the Global View style is, “Gee, I didn’t realize that was parallel code!” Cray’s implementation of Adaptive Supercomputing leverages our investment in multiple processor technologies and aims to bring these together into a single system, supported by our continued focus on creating balanced systems with high bandwidth, low latency networking, and an integrated software layer that allows for higher user productivity. Cray’s Adaptive Supercomputing architecture will evolve over time. Today, we have separate individually architected machines. Plans are already under way for Phase I, where we will create an integrated user environment – creating a foundation for these differing processor architectures. Eventually, we will integrate the processor technologies into a single system. And beyond – in phase 3 – we expect to see continuous technology breakthroughs where the software tools become “smarter” and extend the adaptive characteristics of the system, and users will find it easier to map their applications onto the systems. Cray's Adaptive Supercomputing vision offers a comprehensive environment for HPC users that not only allows them to run their code on the optimal processor type, but also streamlines the process so they can spend more time solving scientific problems, and less time programming.