HPC User Forum Focuses On Computational Chemistry & Benchmarking

PRINCETON, New Jersey - A record 110 attendees participated in the semi-annual HPC User Forum meeting held here last week. The major focus areas for the Princeton meeting were computational chemistry, and performance modeling and benchmarking, according to Steering Committee Chairman Larry Davis, of the DoD HPC Modernization Program. The HPC User Forum is directed by a steering committee consisting of users from government, industry and academia, and is operated for the users by market analyst firm IDC. Vernon Turner, IDC group vice president for High-Performance Computing and Enterprise Servers, thanked Debra Goldfarb, now at IBM, for her key role in the formation and success of the HPC User Forum. Earl Joseph, IDC, who serves as executive director for the HPC User Forum, gave an HPC market update. He said year-over-year revenue declined 7.2 percent in 2002, to $4.7 billion, but the market is projected to average six percent annual growth (CAGR) through 2006, with 12 percent growth in unit sales. Clusters, Linux and biotechnology are strong growth areas. In October 2003, the HPC User Forum plans to hold its second round of dialogue meetings between U.S. and European users in Paris (Oct. 20) and London (Oct. 22). Contact Earl Joseph (ejoseph@idc.com) for details. The group will shape its future technical agenda with the help of a survey recently emailed to 1,200 members of the HPC community. John Grosh, OUSD (S&T) and co-chair of the High-End Computing Revitalization Task Force (HECRTF), said the Earth Simulator triggered the formation of HECRTF. What the Japanese did right, he said, was to develop and fund a long-term plan, with goals tied to citizenry needs, then provide resources commensurate with the goals. The biggest gains from the Earth Simulator could be in fields other than climate and weather, such as nanoscience. A lack of U.S. attention now to HEC will cause devastating problems in years to come. Grosh said the HECRTF plan has been completed and will be released in coming months. HECRFT is working now with OMB and others to secure funding for the plan. Hirofumi Sakuma of the Earth Simulator (ES) gave an update on the program. As reported in the Bulletin of American Meteorology, Earth Simulator scientists completed a 50-year run of the Ocean General Circulation Model at 10-kilometer resolution, using 180 nodes of the ES system. They are preparing a much-longer resolution that will include sea ice. For the Atmospheric General Circulation Model, they would want to use all 640 nodes (64 GF/node). Robb Graham, Instrumental, Inc., discussed synthetic benchmarks under development for scoring performance and to identify strengths and weaknesses in computer architectures (system usability). The five sets of benchmarks cover computation, network, memory, I/O and operating system. These benchmarks are being used as part of the overall benchmark set for acquisition decisions by the DoD HPC Modernization Program (HPCMP). Bill Ward, ERDC, talked about the use of application benchmarks to provide system selection advice within the DOD HPCMP. The current package includes application synthetic tests, with six codes (including a vector serial code) in the applications test. One goal is to embed the whole process in an Oracle database in the future. Cray Henry, director of the DOD HPCMP, said the government "performs a simulation before we put anything on an aircraft," with similar approaches in the Army and Navy. In FY2004, there will be about 700 projects involving 4500 users with aggregate real-time requirements of 133 teraflops-years. The annual investment in HPC is on the order of $40 million, with HPC assets seen as having a half-life of 3-5 years. The program uses dedicated applications and synthetic benchmarks today, but would like to use all synthetic tests because dedicated applications are more expensive to work with. It would have to be proven, though, that synthetic benchmarks and application "signatures" can be correlated. Allan Snavely, San Diego Supercomputer Center, said "the User Forum "has come a long way in addressing some of the performance modeling and benchmarking issues we started grappling with early on." adding that "time-to-solution is what you really want to measure, but it is complicated and problem-dependent and policymakers need a simple number." Snavely updated the group on the PmaC HPC Benchmark Suite (www.sdsc.edu/PmaC). He said tutorials are scheduled at SC2003 and SIAM PPCS. Phil Mucci, Innovative Computing Laboratory, University of Tennessee at Knoxville, noted the processor-DRAM gap (latency) and said architectural changes in the 1990s tried to alleviate this growing problem: prefetching, streams, 64-addressing and others. He described ICL benchmarking activities, including the Performance Application Programming Interface (PAPI), whose goal is to feed back performance analysis into advanced compilation techniques. PAPI is now available for nearly all HPC architectures. Jeff Vetter, Lawrence Livermore National Laboratory, stressed the need to understand performance at every stage of the life cycle: design, integration, procurement, installation, optimization. To plan for future applications, computer architects need more information than just sustained performance requirements. They need to know the nature of each application. "The Sequoia toolkit helps us characterize applications," he said, illustrating this with an application case study involving unstructured mesh transport. David Bailey, LBNL, discussed the Performance Evaluation Research Center (PERC), a collaborative effort funded by DOE under the SciDAC program. Its mission is to develop a science of performance, and engineer tools for performance analysis and optimization. He said PERC's focus is on grand challenge-scale applications and on "improving our ability to influence future systems." Bailey asked how we will do performance monitoring and modeling when we have systems with 10,000+ CPUs and said the solution is intelligent, highly automated performance tools, applicable over a wide range of system sizes and architectures. Bryan Biegel, NASA Ames Research Center, gave an update on the NAS Parallel Benchmarks. In 2002, NPB 2.4 Class D became available, with goals to scale with science and HEC systems, as did the data-intensive NPB 2.4 BTO benchmark, which checkpoints BT every five time steps. NPB 3.0 addresses new parallelism paradigms: HPC, OpenMP, Java in 2002, and multi-level, MPI/OpenMP and shared memory/OpenMP in 2003. Near-term plans are to release a Multi-Level reference implementation, an Unstructured-Adaptive benchmark, NPB Class E and GridNPB. Sun Microsystem's John Gustafson talked about the company's Purpose-based Benchmarks. "Our thesis is that if you use existing benchmarking schemes, they won't allow you to design radical new machines. Our objective is to create a set of representative, scalable HPC benchmarks that are as simple as possible, but no simpler. We hope to test productivity of programming models-not just execution time, but how long it takes you to get the program up and running. This definitely has implications for benchmark tests." The PBBs are based on the concept of acceptability-what results are acceptable from the user's standpoint. The answer might be a more fuel-efficient car rather than a gigaflop rate. John McCalpin, IBM, discussed simple and composite metrics for system throughput. "The question I'm asking is, how simple can you get? How much can you explain with one parameter? Now, what the right second parameter? What's the minimal number of parameters I need?" McCalpin said the number of degrees of parallelism in even the simplest HPC machines creates a huge potential performance range. "I used a composite model that makes bytles/flop a simple function of cache size. This revised metric is a much better performance predictor for SPECfp_rate2000." Composite methodology is simple to understand and measure and based on a mathematically correct model, he said. Daniel Pressel, U.S. Army Research Laboratory, described "Envelope," a new approach to estimating delivered performance. The approach, based on back-of-the-envelope calculations using dozens of hardware and software parameters, "can predict single processor performance to within a factor of two of measured value, and within 10-15 percent in many cases." Pressel said Envelope has been a more accurate predictor of machine performance than STREAMS or Linpack. On the computational chemistry front, Yuan-Ping Pang, Mayo Foundation and Mayo Medical School, described their DARPA-funded project, Rapid Identification of Countermeasures to Chemical and Biological Weapons. The project aims to skip time-consuming experimental methods in favor of an in silico approach. In conjunction with the project, the team built a 1.1 teraflop computer with 470 Xeon processors. "You need two kinds of machines to do real-time biological simulations: a more-expensive centralized machine and a low-cost local machine." The team helped determine the 3D structure of SARS 20 days after the SARS genome was identified. "Supercomputing will give us ability to predict any natural or man-made mutations of known toxins, and to develop countermeasures for them." Winifred Huo, NASA Ames Research Center, discussed computational chemistry for space applications. She said chemistry codes tend to be difficult to parallelize. "The accepted premise is that the path is from serial to parallel code. Data locality, data dependency are common issues. But if try to localize, your parallel code becomes very different from your serial code." She said the major bottleneck in scalable quantum chemistry codes is in the physics. Whenever two electrons coalesce, this event is non-local. This is very different from CFD. David Filkin, DuPont Corporation, said "HPC supports, integrates and enables all the sciences that we pursue: biology, physics and chemistry." He said DuPont is studying new materials for use in the photolithographic process in microelectronics. "Future materials need to last through 75 million exposures to light, and to date all candidate materials degrade well before this. They must be transparent (non-light-absorbing) at 157 nm." The computational goal is to explore the transparency of candidate materials. Experimentation takes 2-3 weeks per polymer (people and lab), versus one molecule/day if computational chemistry is working well. Mark Gordon, Iowa State University, discussed high performance computational chemistry on DoD HPCMP systems. He said single configuration methods fail in a number of important cases, in which case you need to turn to multi-configurational SCF (MCSCF). The ultimate method is Full CI (FCI). "It's exact for a given atomic basis, but scales exponentially so can't use it for a large, full-size problem. But it's valuable as a benchmark to evaluate how other methods are doing." The topic of Ruth Pachter, Air Force Research Laboratory, was structure and properties of materials for Air Force applications. She focused on applications in quantum chemistry and molecular dynamics, particularly discussing nonlinear optical and nanomaterials, and also pointed to interest in developing materials for polymers, alloys for engine applications, composites and ceramics," because different methods are used for those applications. Oak Ridge National Laboratory's Robert Harrison described NWChem, a software suite historically emphasizing molecular chemistry but being extended for other uses. NWChem is portable among HPC systems and used at 1,000 sites today, he reported. "The main issue for current machines is managing the memory hierarchy. Over the next five years, machines will have increasingly deep memory hierarchies. We can implement NWChem on parallel machines by moving data, rather than the less-efficient method of sending messages. Message-passing useful for some applications but for the kind we deal with, we don't want to pay for that dependency." He said vendors are not providing the tools needed to manage memory hierarchies. David Koester, MITRE Corporation, introduced the session on the DARPA HPCS program, whose goal, he said, is to provide a new generation of economically viable HPCS for the national security and industrial user community in 2009-10. "We want to focus on the lost dimension of HPC-user and system efficiency and productivity. We want to look at benchmarks other than just Linpack." Burton Smith, Cray Inc., described the company's "Cascade" project for the Phase II DARPA HPCS program. "Cray's approach is to implement very high global bandwidth and use it better than we do today, and to provide configurability in the bandwidth so you can use it exactly where needed. We also want to provide performance measurement tools to enable optimization." Smith said connection costs badly trail Moore's Law. "Most of the hardware cost today is connection cost. It's important to make this bandwidth cost less expensive and to use bandwidth wisely." John McCalpin discussed IBM's "PERCS" initiative for the Phase II program. PERSC stands for productive, easy-to-use, reliable (?) computing systems. "This is a set of enhancements to the IBM mainstream product, a dynamic system that adapts to application needs. It combines innovation with commercial viability. McCalpin said PERCS will fold into IBM's mainstream product line in the post-POWER6 timeframe. John Gustafson described Sun Microsystems' Hero effort for Phase II, noting that "Hero" doesn't stand for anything." He said one-quarter of Sun's revenue today comes from HPC. "With Hero, we're starting with a clean slate for HPC customer workloads, not trying to adapt our commercial systems but asking, what does this market need? Speaking about Hero innovations, Gustafson said "memory bandwidth will go way up and be matched to floating point speed, and our Purpose-based Benchmarks measure productivity explicitly." Jack Collins, Advanced Biomedical Computing Center, NCI, talked about coping with the biological data explosion. "All the bio data helps us understand what's going on, but it's hard to deal with. Mining data must be faster than its generation," he said. "We're trying to see if you can take a blood sample, run it through a mass spectrometer and see if the person has ovarian cancer. We would need to add 72 processors a month to keep up with data growth-that's a single lab on a single problem." He explained how they are using Star Bridge systems to match their applications and obtain higher performance. Suresh Shukla of Boeing delivered a well-received talk on HPC purchasing decisions. He said the main components of decisions-cost, applications, requirements and technology-are sometimes out of phase. "Decisions on airplanes are made 8-10 years in advance. We have to go to management today, a bad time for the airline industry, about what Boeing will build a decade from now." Shukla said in recent years, HPC users have focused more on cost-reduction than solving bigger problems. "I have nothing against reducing costs, but have we forgotten the value an engineer or scientist can add to a design?" IDC's Mike Swenson gave a Life Science market update. "Hardware is the biggest component today, but we see services becoming the biggest part (~40%) by 2006. Also, in silico R&D will double in this decade," he said. Market drivers for computational requirements are the "$1000 genome," virtual high-throughput screening, systems biology and personalized healthcare. He said IDC is working on a gene sequencing series and a systems biology series of market studies. Fred Shields, Fujitsu Technology Solutions, Inc., reported that Fujitsu will install a 2048-CPU Linux cluster using Infiniband interconnect in Japan in March 2004, that a Fujitsu PRIMEPOWER HPC2500 system is number seven on the Top500 list, and Fujitsu implemented the first 1.76 terabit long-haul DWDM network. Eric Pitcher, VP of product marketing, Linux Networx, reported recent orders at Los Alamos National Laboratory (a 512-CPU and a 2816-CPU system), SNL (256 CPUs), ARL and White Sands, Presearch (systems integrator for Naval Warfare Center), John Deer and Audi. He said the company is developing a strong ISV program and has established a solutions center in conjunction with hardware and software partners. The next User Forum meeting will be held April 12-14, 2004 in Dearborn, Michigan and will focus on HPC in manufacturing (automotive, aerospace) and HPC architecture. Attendees will hear from leaders in each of these areas. For more information about the HPC User Forum and its activities, go to http://www.idc.com/hpc.