ScicomP7: IBM's Scientific Computing Users Conference at GWDG in Goettingen

Munich, GWDG (Gesellschaft für wissenschaftliche Datenverarbeitung mbH) in Göttingen organized the Spring ScicomP7 from March 4th to 7th by, Germany. More than 80 participants of the scientific computing arena came from all over the world, e.g., the Arctic Region Supercomputing Centre down to Italy. They discussed the usage, applications, tools, hardware and the IBM roadmaps in processors and software. The program was divided into two groups, IBM sessions and user sessions with climate, tools, optimization and systems. Here some of the trends, plans and talks are summarized. In the meantime powerpoint or pdf versions of most of the talks are available on the web. ScicomP: The IBM System Scientific Computing User Group, SCICOMP, is an international organisation of scientific/technical users of IBM systems. Its purpose is to share information on software tools and techniques for developing scientific applications that achieve maximum performance and scalability on systems, and to gather and provide feedback to IBM to influence the evolution of their systems. SCICOMP holds periodic meetings which will include technical presentations by both IBM staff and users, focusing on recent results and advanced techniques. Discussions of problems with scientific systems will be held and aimed at providing advisory notifications to IBM staff detailing the problems and potential solutions. Mailing lists will be maintained to further open discussions and provide the sharing of information, expertise, and experience that scientific and technical applications developers need but may not easily find anywhere else. GWDG The organizer was GWDG, Goettingen, Germany. It is an institution jointly founded by the Bundesland of Lower Saxony (Niedersachsen) and the Max-Planck Gesellschaft in 1970. It is the University Computer Centre and a computer and competence center for the entire Max-Planck-Gesellschaft. Apart from this it carries out research in Computer Science and supervises the training of computer technicians. At the moment GWDG operates an IBM RS6000/SP with 224 Power3-CPUs and 172 GByte RAM (336 GFlop/s peak, rank 333 in the Top500) and 3 eServer pSeries 690 Regatta with 96 Power4 CPUs and 96 GByte RAM (422 GFlop/s peak, rank 373), which have a total of 758 GFlop/s peak performance. The Unix workstation cluster uses about 60 servers of varying performance and varying equipment for intensive computation applications. ScicomP7 Abstracts of the Talks Professor Dr. A. Tilgner (Inst. of Geophysics, Univ. Göttingen) gave the keynote, "Geophysics and High Performance Computing" dealing with geophysical fluid dynamics. IBM Vendor Talks Pratap Pattnaik, IBM, gave an overview of the Evolution of IBM Power Architecture. He discussed the new Simultaneous Multithreading Technique. Because of multiple functional units and shared registers, shared issue queues and a dynamic instruction selection and issue unit, this technique can be realized. Threads can have a priority by the software, from 0 to 7, highest priority. Peg Williams, IBM, highlighted IBM's HPC: Strategy and Direction. She outlined the vision how IBM plans to evolve the HPC roadmap to deliver leadership function, performance, and scalability. The hardware strategy is based Power processors, SMPs like Regatta (32-way) and Squadron (64-way) and the switch interconnects up to the Federation Switch. The software is characterized by data management, GPFS (General Parallel File System), MPI, LAPI, LoadLeveler for resource management, Tools, systems management PSSP (Parallel System Support Programs), and the new CSM (Clustered System Management), which will be used on pSeries and xSeries (Intel IA-32 based systems). In 2003 - 2004 the Power5, has a clock of 1.4 - 2 GHz, an enhanced distributed switch, SMT, better floating point performance and a faster memory. The Power5+ will have 2 - 3 GHz. Then the Squadron, 2004, scales from 1 to 64 processors in an SMP. A „mainframe-like" partitioning is possible, multiple partitions on a single chip. The Power6 will be used across all non-Intel IBM series, iSeries, pSeries, zSeries with large frequency enhancements and scheduled for 2006/2007. The Federation Switch is capable of 4 GByte/s, 2x2 GByte/s per port. More than 2 Gbytes/s per link for MPI and about 5 to 9 micro seconds MPI latency. AIX will be the leadership, high end operating system for the enterprise and an interoperability between AIX and Linux is possible. The AIX release plan, version 5.3 in 4Q04 (Power5), 5.4 in 4Q 2006. CSM for pSeries and Linux is the successor of PSSP, used for system administration, security, installation and configuration. BlueGeneL Sid Chatterjee, IBM, presented the architecture and the software of BlueGeneL, the massively parallel computer system being developed at IBM in collaboration with Lawrence Livermore National Laboratory. The BlueGene/L targets a machine with 65,536 nodes, with a peak performance of 360 trillion floating-point operations per second (360 TFLOP/s). The clock will be 700 MHz, it comes out of embedded systems. The program executes exclusively on the compute nodes. The outside world interacts only with the I/O nodes and the processing sets (I/O plus compute nodes). The machine is controlled through service nodes in control surface. As the system is viewed as a cluster of 1024 I/O nodes thus the 65K nodes are reduced to a manageable size. Linux is the operating system. In November 2002 48 Million US$ was shifted to BlueGeneL by LLNL as part of ASCI contract. It should deliver previously unattainable levels of performance for a range of scientific applications, molecular dynamics, turbulence modeling, and three-dimensional dislocation dynamics. Ray Paden, IBM, presented GPFS and Parallel I/O. GPFS is a mature, robust, parallel file system available on IBM systems running either AIX or Linux. He examined GPFS features useful to the HPC applications programmer. Christoph Pospiech IBM S&T C showed the differences in AIX and Linux, the pros and cons. He detailed the software environment for HPC workloads, Linux distribution, compilers, tools and the extensions, IBM plans in the direction of mathematical libraries. Following an easy graph, one can decide to use AIX or Linux. If one needs a broad spectrum of ISVs (Independent Software Vendor), large SMPs and knows AIX, then this operating system is the right choice, in the other case Linux. There is an IBM HPC open source software project. The current IBM offering lacks some functionality, some will be filled in the future by IBM products. All changes, without porting and tuning, will be handed back to the community. Bugs that are Power4 relevant will be handled on a „as time permits" basis. Beside this there is no IBM support. There is no Linux support for the SP Switch. User Sessions John Hague, IBM, optimized the ECMWF's (European Weather Centre) Weather Forecasting code on up to 960 IBM p690 processors. The center has very stringent requirements for producing a 10 day weather Forecast every day. A mixture of carefully tuned MPI and OpenMP was found to give the best performance on up to 960 processors. Deborah Salmond, ECMWF, told that the center now is running the full operational forecasting suite on the IBM p690. Migration from the VPP5000 is complete, and parallel running with the VPP5000 shows that performance is better than expected. In addition to the operational suite, the system's exceptional throughput enabled ECMWF to run multiple research experiments simultaneously. Jörg Behrens, GWDG, compared MPI, OpenMP and hybrid parallelization on IBMs SP and p690 systems. He evaluated different approaches for the platforms IBM RS600/SP and pSeries690, MPI, OpenMP and hybrid parallelization. Jesus Labarta, CEPBA-UPC presented work on both OpenMP and MPI run times to improve the run time of parallel programs by detecting their behavior and dynamically adapting the assignment of resources. Regarding MPI the work focusses on the run time prediction of communication patterns and is being done on MPICH and targets large systems like BG/L. David Skinner, NERSC/LBL, focused on how multiple concurrent versions of IBM C/C++ and Fortran compilers may be implemented in an HPC environment. Multiple versions of compilers available is favorable to preserve basic functionality and code performance testing. Werner Krotz-Vogel, Pallas, discussed the scalable trace handling and analysis with Vampir 3.x. Large, long running applications and/or with a large number of processes can easily produce large traces that are difficult to handle and to analyze. The new generation of Vampirtrace and Vampir now supports the analysis of such applications. Sigismondo Boschi, CINECA, discussed the experiences in scientific computing on IBM SP4: pain and pleasure. Roman Hatzky, Max-Planck-Gesellschaft, optimized a Particle-in-Cell (PIC) Code on the IBM "Regatta" Hardware. The code considered here is a plasma physics code to simulate the time evolution of ion-temperature-gradient-driven (ITG) turbulence in a cylindrical geometry as a first approximation to the stellarator Wendelstein 7-X. The applied numerical method is a particle-mesh method. Jim Edwards, IBM, discussed debugging a performance problem on the p690 regatta. The scientists at the National Centre for Atmospheric Research (NCAR) site noticed intermittent performance problems on jobs using all of the processors of the node. Applications using 32 MPI tasks had slowdowns of up to 100% under certain conditions. He looked at the symptoms of the problem, the workarounds and possible solutions. Reinhold Bader, Leibniz Computing Center (LCC), benchmarked IA-32 and IA-64 Linux clusters. LCC wanted to purchase a Linux-Cluster containing 90 IA32 nodes as well as 16 64-bit 4-way nodes. They ran a number of synthetic and application benchmarks and tested systems like Pentium 4 with multithreading, POWER4 using MCM- as well as SCM-based systems, and Itanium-2 processors. . Joachim Hein, EPCC, examined the performance of an application code with next neighbor interactions in two dimensions on the HPCx system. The HPCx system consists of 40 IBM pSeries 690 Regatta frames, each containing 32 1.3 GHz POWER4 processors, a total of 1280 processors. HPCx is the fastest Power4 system in the world, it is the fastest academic computer in Europe and ranked as the ninth fastest computer in the world. The major focus of the HPCx service is on capability computing, concentrating on enabling users to run on 512 CPUs and above. Peter Endebrock, RRZN/Universitaet Hannover, summarized first experiences from regular service on the new regatta system. He described some problems and how they could be solved. He presented figures for an individual large user code and described where the bottlenecks were, and how the performance could be improved. Bill Kramer NERSC LBL, presented "Creating Science-Driven Computer Architecture: Blue Planet". Systems are designed for commercial purposes, transaction processing, web serving, do not deliver cost effective performance improvements for large scale scientific codes. NERSC called this the "Divergence Problem" in high performance computing. Scientific performance is dominated by bandwidth, memory latency and interconnect hierarchy, where current processor and system roadmaps are failing. Blue Planet is a "science driven architecture" concept developed by a team from Berkeley Lab and IBM. He discussed the underlying scientific needs, explained why the current technology roadmap is getting worse for science and the Blue Planet ideas that will start to address this problem. http://www.spscicomp.org/ Uwe Harms Harms-Supercomputing-Consulting Munich, Germany