Two Washington Insiders Discuss DTF/Teragrid

By Steve Fisher, Editor In Chief -- In light of the NSF’s and indeed Washington, DC’s allocation of $53 Million to build the Distributed Terascale Facility (DTF) Supercomputing Online solicited the opinion of two gentlemen who know something about DC, Bob Borchers, Director of NSF’s ACIR division and Steve Wallach, PITAC member and Vice President & Chief Technology Officer, Chiaro Networks. Supercomputing: Does undertaking such a large-scale project like DTF/teragrid that depends so heavily on clustered systems signal the eventual demise of the stand-alone supercomputer system? BORCHERS: Well, it will be tough for "conventional systems", whatever that is these days, to compete on a price performance basis. It is already true that the bulk of fast machines, like the SP and Compaq are already clusters. Supercomputing: Do you feel that the development of a distributed system with this much power and flexibility could potentially be as significant to scientific research as the advent of the Internet? BORCHERS: Time will tell. Depending on software, I believe simulation is already that important to science. When you compare to the Internet broadly, you have to take in the social aspects and there is no way the impact can be the same. Supercomputing: Would a system like DTF/teragrid have to be prioritized? Will it be prioritized? If so, what sort of research will be the primary benefactor of the greater time or resources? Who decides how much time is awarded to researchers? BORCHERS: Resources on the Teragrid will be allocated on the basis of scientific merit by the National Resource Allocations Committee (NRAC) which now allocates our other resources Supercomputing: Is the funding just from the NSF's annual/normal budget or did you folks have to pull some strings in DC to get extra money allocated for this project? BORCHERS: $45 of the $53 million is new money from the Major Research Equipment account. The other $8 is PACI base budget reallocations. Supercomputing: Does undertaking such a large-scale project like DTF/teragrid that depends so heavily on clustered systems signal the demise of the stand-alone supercomputer system? WALLACH: No, not really. I believe this is a natural evolution. Even before the DTF, there existed some semblance of a DTF. However, the resources were not as integrated as they will be in DTF. To a large degree DTF-like structures existed within the stand-alone supercomputer centers. In this last case, the connectivity used high speed LAN's. Now that long haul communications is as fast, if not faster than LAN's, the existing structures can physically evolve. Of course this is not as simple as I may make it. Latency across the country is significantly higher than latency within your own campus. But on the flip side, bandwidth using DWDM techniques, is much faster than the I/O bandwidth of most computer systems. The DTF will serve as a vechicle to eliminate the inconsistencies mentioned above. The DTF will serve as the vehicle to accelerate the convergence of communications and computation. Supercomputing: Do you feel that the development of a distributed system with this much power and flexibility could potentially be as significant to scientific research as the advent of the Internet? WALLACH: The Internet and DTF serve two different purposes. The internet permitted real-time exchange of ideas and provided an exceptional medium for collaboration. There are some examples of distributed processing (SETI is one example). We must differentiate the internet from high speed networking. I sometimes confuse the issue myself. Due to the demands of the Internet, long haul telecommunications capacity increased exponentially. This in turn permits DTF-like architectures to evolve. And, with 10 gig ethernet and OC192 (10 gigabits/sec), we have an impedance match between what a computer system can deal with as well as long haul telecommunications. DTF uses this impedance match and will put in place the next steps. That is using multiple OC192 channels to facilitate reliable transfers of giga and terabytes of data. The significance to scientific research is that with compute, storage, and telecommunications all integrated within one coherent system, scientific research will no longer have to know (in ideal circumstances) where the data is located and where the compute is performed. When we access a Web server today, we know the URL, but we have no idea where the server is physically located. ======= Supercomputing Online thanks both Bob Borchers and Steve Wallach for their time and viewpoints and also apologizes for its inability to delve deeper into these important issues with them due to “launch” time constraints. That will not happen again if they are kind enough to receive our questions. ======= To comment on this story see the “send your comment” button below.