NCSA’s Dan Reed Discusses DTF/Teragrid and HPC’s Bright Future

By Steve Fisher, Editor In Chief -- With all the events in HPC over the last ten days Supercomputing Online had to talk to one of the communities biggest players, NCSA. In this interview Director Reed discusses DTF/Teragrid, the recent DOE SciDAC program funding, and what he refers to as the triad of high performance computing research. Supercomputing: Congratulations on the recent DTF/Teragrid announcement. Would you please tell the readers a bit about the Computational aspects of the Teragrid project that NCSA will be leading? REED: The primary focus of our work is, well, there are many, but two of them in particular will be to look at cluster software because the IA-64 clusters that we’ll be rolling out will be building on a lot of the cluster software development that has gone on here as well as a number of other places. The grid infrastructure to connect systems. Again, a bunch of work here and at Argonne and other partner sites. Those are two of the things that NCSA will be investing a lot of time and energy in. The Teragrid project involves the four institutions, NCSA, Argonne, CalTech and San Diego. All of the pieces of work that are going to be done in terms of grid infrastructure, cluster software, data management, wide area networking, applications, each of those areas all of the four institutions are going to be involved in work on all of those things. But each of the institutions brings unique strengths to the partnership as well. I mentioned clusters and grids as two of the things that NCSA will be working on. San Diego will be working on data management, large-scale data management, as well as these other things. Argonne will be working on grid infrastructure, some viz work, and CalTech will be working on applications. But as I said, all four groups will really be working on really all the components of the Teragrid. There has to be a seamless national infrastructure where all the pieces interoperate. Supercomputing: Does undertaking such a large-scale project like DTF/Teragrid that depends so heavily on clustered systems signal the demise of the stand-alone supercomputer system? REED: No not at all. I think there will always be a role for systems that target specific applications or problems at a particular institution or organization. What I do think we’re seeing as a transition is, we are seeing transition I think from the way science and engineering is being done in the U.S. There will always be a place for the single investigator, small research group model. But what’s increasingly happening is that some of the problems that people want to solve require the expertise, infrastructure and talents of national or even international groups. Whether it be in connecting distributed instruments, managing and integrating data archives from multiple sites, taking technical expertise that spans a larger problem. For example, if you wanted to understand the interactions of environment and population dynamics, and climate change. Those are issues that really require multidisciplinary expertise. One of the motivations for the DTF/Teragrid is really to start to put in place the backbone of an infrastructure that we think will start to support this next generation of problems that need to be solved. The whole series of groups emerging that are committed to tackling these problems. Those range from things like the national/international detector groups associated with the LHC that’s going to come up at CERN to earthquake engineering. Things like the proposed national observatory that’s going to integrate telescopes and data archives in the U.S. and internationally. All of those things have the flavor of distributed technical groups, unique scientific instruments coupled with high performance computing, data archives and networks and data analysis, visualizations, distributed collaboration. So the infrastructure of the teragrid is really trying to support those things. It will also support the single investigator, traditional computational science applications as well at the individual sites. But we’re really targeting both of those kinds of models of computing. I think there’s this transition going on from mostly the one to include the distributed groups. But single site supercomputing systems are definitely not going to go away. Any more than, if you look at the history of computing, each succession of technology did not make the previous go away. PCs didn’t make desktop systems and workstations go away and those didn’t make mainframes go away. They’re each layers of an onion of technology that provide solutions to important problems. Supercomputing: Do you feel that the development of a distributed system with this much power and flexibility could potentially be as significant to scientific research as the advent of the Internet? REED: Oh boy, that’s a pretty open ended question there. (laughter) Supercomputing: (laughter) I know I apologize for that one. REED: Well, It’s a variation on the answer I just gave you before. There is a power of exponentials that comes in to play in distributed systems. The Internet, the precursors to it have been around a longtime, going back to the early ARPAnet days. It was only when there were enough connected sites that the behavioral dynamics really took off into the Internet that we take for granted now. I do think that there is a reasonable chance that this kind of backbone infrastructure can seed this larger sort of distributed computational grid. That’s what we hope will happen. And the reason is really the one I was alluding to, that all of these distributed groups, many of them are building clusters of their own, they’re collaborating with other groups and what we’re trying to do is the intellectual equivalent, with the Teragrid, of building a transcontinental electrical power distribution system with some high-end generators as the core of it. What we hope will happen is that this will be the catalyst to connect regional grids, campus grids and even in the limit, push down mobile sensing technology and environmental sensing and integrate those with this infrastructure. I think what we hope will happen over time is that the model of computing, at least research computing is one that you reach out and access resources without really worrying about where or when or how they’re connected. The same way you access electric utilities without worrying about where transmission lines or generating stations are. That you’d be able to tap data archives, computing resources, visualization resources, data mining resources without worrying about where they are. So the issues there are not just the physical and infrastructure of the networks and the computing facilities, but all the middleware and application software that’s necessary to knit all those together. Supercomputing: What are your thoughts on the recent SciDAC program funding announcements benefiting institutions like LBNL, ORNL, Fermilab and others? All in all, I'd say the High Performance Computing community has really had an extraordinary 10 days or so. Wouldn't you agree? REED: The SciDAC announcements actually involve NCSA and Argonne as well, it’s not just the national labs, there are a bunch of academic institutions involved in the SciDAC program as well. So we’re very happy about DOE SciDAC funding also because we’re collaborators with those national labs. Those kind of things are very important. There’s two or three things that have to go on to make…that are important as part of high performance scientific computing futures. One of them is to have production infrastructure at the leading edge that allows open scientific research. The DTF/Teragrid is one of those components. The other is to have ongoing research and development of scientific applications and the software, middleware that’s necessary to effectively exploit that high-end infrastructure. So the SciDAC activities on scientific applications, on scalable cluster software, on data management, on performance analysis, those things coupled with the application piece are really what are going to make effective use of high-end computing facilities. The other thing that’s important is continuing long-term investigation of next generation infrastructure. So looking at longer term computer architecture design, how do we get to cost effective petaflops, what are our scalability issues for high end systems, those pieces of long term research coupled with those other two things are really the three pieces of the stool if you will, the triad of high performance computing research. So yeah, it has been an extraordinary ten days. I think that the future of high performance computing infrastructure in the U.S is very bright and we’re very pleased about what is going on. -------- Supercomputing Online wishes to thank Director Reed for his time and viewpoints. It would also like to thank NCSA’s Karen Green for her assistance. -------- To comment on this story, see the “send your comment” button below