LBNL’s Erich Strohmaier to Share HPC Performance Expertise at Conferences

For Lawrence Berkeley National Laboratory’s Erich Strohmaier, an internationally known expert in assessing and improving the performance of high performance computing systems, getting an accurate assessment has parallels with his occasional running of marathons. In both instances, how the system performs over the long haul, rather than its short-term potential, is what matters. And as one of the co-founders of the twice-yearly TOP500 list of the world’s most powerful supercomputers, Strohmaier has found that the most important information comes from looking at the changes in the list over time, rather than by just looking at the systems on one list. The 23rd edition of the much-anticipated TOP500 list will be released on June 23. A member of the Future Technologies Group in the Computational Research Division at the U.S. Department of Energy’s Berkeley Lab, Strohmaier is a co-PI in the Performance Evaluation Research Center funded under DOE’s SciDAC program, He is also working on performance characterization and benchmarking projects for other federal agencies. In early May, he held the third in a series of annual workshops in Oakland on performance of existing and emerging HPC systems. In June, he is serving as the chair of the all-day tutorial session on benchmarking at the International Supercomputer Conference 2004 to be held June 22-25 in Heidelberg, Germany. After that, he is off to another conference in France to present a paper he co-authored on “Performance Characteristics of the Cray X1 and Their Implications for Application Performance Tuning.” Supercomputing Online caught up with Strohmaier for a few minutes between projects to get his views on the current state of performance characterization, modeling and benchmarking for supercomputers. SC Online: To start, can you clarify the difference between performance characterization, modeling and benchmarking? Strohmaier: Performance characterization really focuses on applications and how you can characterize the performance behavior of an application, independent of hardware. Our basic assumption there is that the efficiency of data access is the most important characteristic. We try to simulate data access by a non-uniform, pseudorandom process and then try to correlate the performance of this simulated data access with the performance of the actual code. Modeling generates mathematical models of performance and often relates to either the hardware itself or a specific kernel on that system. Benchmarking is the process of actually measuring the performance of a system using a specific code as benchmark. SC Online: So, where do we stand today with regard to benchmarking? Strohmaier: One thing that has been a constant struggle in HPC over the last three decades is coming up with a standard measure of performance. A number of efforts in this area have come and gone. During the last two years, a series of new benchmarks have been coming up trying to fill this gap. One of them is at the core of our research using Apex-Map, our synthetic code. The other is the HPC Challenge benchmark coming from Jack Dongarra and the DARPA community. These new initiatives will be part of the focus of our May workshop. SC Online: At the risk of sounding like a commercial, could you talk a little about the tutorial you have organized for the ISC2004 conference? Strohmaier: The objective of the tutorial is to help people make sense of the growing disparity between advertised peak performances and actual achieved application performances. Analyzing this performance disparity can be quite difficult because of the wide variety of HPC architectures, not to mention the strongly varying performance requirements of the scientific algorithms being used today. We’ll address these issues by discussing such topics as performance measurement tools and methodologies, new HPC benchmarking initiatives and performance comparisons across architecturally dissimilar platforms. We will also look at performance requirements of modern high-end scientific applications, the performance characterization of scientific algorithms and performance modeling and prediction techniques for scientific HPC applications. We have assembled a great team of presenters and I’ll add my own commercial here – you can read more on the ISC Web site at . SC Online: One of the enduring assessments of performance is the TOP500 list, which you have worked on since it was launched. You’ve used Linpack as a benchmark all that time. Any plans to change that? Strohmaier: The TOP500 list is based on the idea of defining a system as one of the top 500 largest, fastest supercomputers. To do that, we needed a benchmark and for traditional reasons, we still use the Linpack benchmark. However, in originating the TOP500 list, we left open the possibility of replacing Linpack, but that’s not likely to happen anytime soon. Linpack has an incredible track record – we have data for 30 years that cover almost every machine ever built. It’s a serious benchmark, but the performance characterization is on the easy side. For Linpack to get good performance, it needs to run as large a job as possible. That in turn gives you a long execution time for a big run – the typical execution time we see is six hours, or more. If anything goes wrong in those six hours, it’s not going to run at its optimal speed. That’s why it’s a serious benchmark, especially for new systems. New systems are sometimes shaky and one processor or one node slows down, giving the system an imbalance. For Linpack, the overall speed would be determined by the slowest processor. It’s an easy benchmark in that there is not a lot of communication and the existing communication is organized in big, regular data-exchanges. This, together with its use of dense matrix multiplications as kernel, allows the performance to get closer to the peak performance. One reason people criticize Linpack is that its performance is much higher than that of other applications. In part, it’s a perception problem. For replacing Linpack in the end it all boils down to whether there will be performance numbers for all the systems out there. It will be very difficult for a new benchmark to get the wide coverage that Linpack has established. SC Online: How did you get involved in performance evaluation and benchmarking? Strohmaier: When I was earning my Ph.D. in physics, I developed and used software for lattice QCD research. When I graduated, jobs in physics were hard to find and my first job was a benchmarking project on Japanese supercomputers. This was at the University of Mannheim. At the university, Professor Hans Meuer had been publishing statistics on the supercomputer market since the mid-1980s. His statistics were based on counting vector processors. With the emergence of the new MPP systems in the early 1990s, the counting method no longer worked. So, we decided to come up with a different idea of which system was a supercomputer – and which one wasn’t. Instead of evaluating on the technological basis of architecture, we decide to use performance as a yardstick. But we didn’t want to use theoretical peak performance because then anybody could put a pile of computers in a garage and call it a supercomputer. We wanted to make sure that the systems were capable of performing serious calculations. That meant we wanted to run a benchmark, and the best one available was Linpack. After working on the TOP500 list in Germany for three years, I got a job at the University of Tennessee in Knoxville working with Jack Dongarra. There was no physics for me after that. In 2000 I moved to Berkeley Lab, which is famous for physics research, but I’m still focusing on benchmarking. SC Online: What’s the latest trend you’re seeing? Strohmaier: The trend toward cluster computing, away from monolithic MPPs. Initially, this was mostly at academic research institutions, and now we’re seeing some industrial applications, such as oil exploration. There are certain applications where clusters can work well, but it’s still questionable if they will succeed as general production systems running a number of different applications. You can get a good cost benefit from a cluster – if your application runs well on the system. SC Online: Have there been any constants over the years? Strohmaier: The pace of change – and how constant that pace has remained. Every six months, up to 200 systems drop off the bottom of the list because they are too small. Whether the economy is in a recession or booming, that pace hasn’t changed much. While the actual performance of processors follows Moore’s Law, the TOP500 list exceeds Moore’s Law because we multiply the number of processors by the performance. Sometimes, I’ll be in the early stages of preparing the next list and be thinking there’s really nothing new, and Whoosh! I get 500 new submissions. SC Online: Looking out three to five years, what do you see as the key issues in HPC performance? Strohmaier: I think the key areas will be tools and how to deal with systems that have tens of thousands of processors. How will we debug the performance of a code on that many processors? There’s no clear solution on the horizon. At the same time, DARPA is showing a renewed interest in architecture innovations – not only more processors, but with new features and different kinds of processors. It’s hard to predict the performance of applications on systems like that. As to processors, there are opposing forces in the marketplace. There is a strong push toward uniformity, due to the cost advantage of standard processors. We are seeing companies stop making their own processors and switch to major processor vendors. But some applications don’t run well on these and benefit from specialized systems. That can overpower the cost advantage. The best example of this was the Cray 1 – it had no OS, no tools and no support. But it was 100 times faster than any other computer, so it was a success. It’s not clear that that can happen again. SC Online: The TOP500 list seems to have taken on a life of its own. Looking back, have there been any surprises? Strohmaier: We really didn’t anticipate that it would become so popular. And after the second list, we realized that the value of the ranking was not as a snapshot, but as a mirror of what’s going on in the HPC market. It really answered a lot of questions, like how in the beginning of the 1990s people were asking, “Are MPPs real supercomputers?” The TOP500 clearly showed that a large number of customers were switching to MPPs. Another thing we didn’t realize was how much both sites and manufacturers would use the TOP500 list to claim bragging rights. Both research organizations and vendors regularly point to their standing on the list. It really has become the definitive resource in performance measurement.