NCSA/The Alliance at SC2001: An Interview with Dan Reed

By Steve Fisher, Editor In Chief -- NCSA and the Alliance have made a number of announcements this week and participated in numerous demonstrations including the prototype Teragrid on the show floor. In this interview, NCSA & Alliance Director Dan Reed discusses the Teragrid, the Titan cluster’s performance on the Top 500 list and the realm of possibilities. Supercomputing: How are you doing? How’s your show thus far? REED: We’re really excited. We’ve got a bunch of new technologies here to show off. We shipped a big chunk of our Itanium cluster here to show off. It’s the biggest IA-64 cluster in the world. We’ve got a bunch of new tile display wall technology, and we’re distributing The Alliance X-In-A-Box software packages on CD which contain cluster development tools, grid tools, tile display wall visualization tools and also access grid collaboration tools. Supercomputing: Please tell the readers about the collaborative visualization work with SGI and Cambridge that was announced this week. REED: That’s the outgrowth of a long-standing cooperation with SGI to look at how we leverage technology in modern virtual director work that Donna Cox and her team have done to integrate science and viz and outreach. One of the interesting things about this job is you really do, unlike my previous life where I was a researcher and a teacher, you get to see the impact of science and technology on the public, and these kinds of things are really great examples of how you take really leading-edge science and you apply great tools to make the sense of excitement obvious to the general public. I’m convinced that every child is born a scientist and we often beat it out of them in the education process. And so part of the real joy of this is to see people excited about what science means. Supercomputing: How did the teragrid prototype work go? REED: Well there’s hardware in the NCSA, Argonne, Caltech, and San Diego booths. We have applications running across all four sites so what we did was San Diego shipped some hardware, we broke down some of the hardware we have to deploy Argonne and Caltech and we brought hardware of our own. We put it together with a high-speed network and it’s worked very well. So there are applications at all four sites, they’re running (applications) that span multiple sites. There are lots of benefits to that. One, it was an early activity to bring all the teams together to start looking at all the technical issues we were going to have to face with the deployed hardware next year so the whole process, people working together has been really positive to see people trade insights and expertise and make things happen. But, the application piece itself has been interesting. Some of the grid apps, some of the Cactus apps and other things that go across the machines are really early exemplars of what we think will be commonplace in the future. Supercomputing: That thing must have been a bear to set up and get going. You didn’t have more than what…three days? REED: Started set-up on Friday so yes, there were a lot of late nights by a lot of people. There were several people who worked all day, all night and then the next day. But, if you talk to the people, they’re really excited about it. This is really a labor of love for a lot of people. Supercomputing: Your cluster Titan showed very well in the Top 500. If you would, please give us a little background of the architecture of the cluster and generally what it’s all about. REED: Titan is an Itanium cluster. We’ve been working with Intel, NIBL for almost two years. There’s been a lot of early work porting the applications to Itanium and doing tuning, optimization, looking at software etc. So the hardware has been on sight for multiple months in Urbana and it’s about to go into production mode next month at the national user community. It’s ranked number 34 in the Top 500 list. To our knowledge, it’s the biggest Itanium cluster in the world and people already have some breakthrough applications for it, molecular dynamics codes running at benchmark rates that most folks have never seen before so they’re really excited about what’s going on. It’s 320 processors, it’s about 160 dual processor nodes. It’s Myrinet connected, and its peak performance is a teraflop. It’s in the new machine room where the first part of where the Teragrid clusters are going to go, so we’re using this hardware for experience for the next generation McKinley clusters for the Teragrid. Supercomputing: What do you see as the most significant development at this year’s show. REED: Well, obviously grids are the story of the day. Clusters and grids are almost inseparable though because they’re by and large the engine that’s powering the grid. But the grid technology we really see as becoming the topic that everybody talks about. There are some people that are still trying to figure out what it means, but there’s no doubt that whatever it means they want some of it. And so a lot of the things we’ve been saying over the last four or five years about grids are starting to come true. I think it’s a lot like what happened with the Internet in some sense because as a geek when I looked at…when the Web browser first came out, my first reaction was that there’s nothing here that I can’t do with FTP. What I failed to recognize was lowering the energy barrier for people to actually interact with remote information is going to qualitatively change what happened. And I think we’re in the early stages of the same thing with the grid transforming the Internet from being largely a passive information source where you can reach out with a browser and grab something, but you can’t do anything with it by and large other than look at it. And that’s true even of dynamic media. But to be able to connect remote instruments and data sources and computing systems so that you can ask questions that require synthesis of information rather than you having to do it all in your head. Coupling those resources. That’s going to change a lot of things and I think, obviously the vanguard is going to be on the high end computing and breakthrough science realm, but I think it’s going to percolate down just like the Web did and have a lot of ramifications for the general population. Supercomputing: Is there anything you’d like to add? REED: Well, as I said at the outset, we’re really excited about where we are and the Teragrid is the beginning of a new thing. And we’re already looking beyond the Teragrid to what petaflop systems are going to look like in the future. So Teragrid is an order of ten teraflops so thirteen and a half. We’re trying to move to a world where petaflops become high end capability but equally important is where cost effective teraflops will occur. So think about teraflops at $50,000 - $100,000 and what that would do for science. That’s going to happen in a few years. One of the early ways we’re starting to look at that, one of the things we’re demo-ing here is we’re running Linux on PlayStation IIs. We’re looking at where that commodity technology is going to go as a successor to cluster technology. If you look at what happened with PCs and how every technological revolution has eaten up part of the market from before. So the PC market has impacted the high end proprietary market, the workstation market. The game market, which is huge compared to the PC market, the economic forces behind that brought prices to a different level so making petaflops out of commodity toys is not without the realm of possibilities. ---------- Supercomputing Online would like to thank Dan Reed for his time and insights. ----------