Q&A with Anne Trefethen, United Kingdom's e-Science Core Programme

Anne Trefethen leads the United Kingdom's e-Science Core Programme and is executive director of the Interdisciplinary e-Research Centre at Oxford University. She sat down with NCSA's J. William Bell at SC05, the annual high-performance computing conference, to discuss the state of the e-Science program and its implications for similar efforts. Start out by giving us a feel for what the U.K. e-Science program is, what your high-level goals are. The U.K. e-Science program has been going now for almost five years. In that time, the investment has gone into people--research--not machines or networks. The goal of the program has been to use science problems as a driver to create the infrastructure to support future scientific research. We've been trying to build an infrastructure that allows scientists to make collaborative teams distributed globally… What we've also been trying to do is make sure the focus is not on computers, because the focus has been historically about using distributed computers but it's really about using distributed resources of any kind. So making it easy for scientists to use large facilities. To be able to use collaborative technologies. To bring in specialists within their development phase or at a critical moment. To be able to access databases remotely. To be able to do that means a lot of work to agree to data standards, to other standards. And we have found that for the majority of the science applications in the UK the focus is about data – data management, data federation, data curation and there are many issues to solve here. You talked about using science problems as the driver. NCSA is really starting a defined, explicit push to incorporate those community engagement issues into the way that we build infrastructure. Why has that been important for the e-Science program? It really was important. The program was set up with joint funds to these science problems and to generic infrastructure…The part of the program that I direct is looking at the generic technologies to support e-Science, while the science applications have been funded by various research councils in the U.K. oriented toward particular fields like medicine or the natural sciences. Each looking at their separate discipline but the resulting science applications are interdisciplinary bringing together the application scientists and the computer scientists. This program is somewhat unusual because it's funded across all research councils. That's made it easier to for scientists to collaborate with other disciplines because the funding has been there to promote that… At the time the program was started, we weren't sure, as technologists, that this was the right way to do it. Now, I'm utterly convinced that it was. In the beginning, we took what the U.S had developed and created an infrastructure that people could begin to use. It helped our technologists get up to speed about what's required in building these types of things. It helped our scientists begin to understand what was useful. But you always have that tension between actually solving science and building a sustainable infrastructure. So is that tension inherent in combining science and cyberinfrastructure or is it a matter of how the challenges tend to be attacked? It's the nature of how it's attacked. I don't think there's inherently a real issue, but, because there's been a cross-over of technologies, there's an issue of bringing them closer together and making them mature. You mentioned the decision to use some U.S. technology as the base. Flesh that out for us. We had three pieces of software that we used as the base of our grid. Those were the Globus Toolkit, SRB, and Condor. [The Globus Alliance's Globus Toolkit and the University of Wisconsin's Condor were both developed, in part, under the auspices of NCSA. The Storage Resource Broker is a product of NCSA's sister site, the San Diego Supercomputer Center.] We created a starter kit, including those utilities. That was used as the basis for some national grid efforts, but we also used it, if you like, to build the knowledge base in the UK. Nobody had used these technologies very much. We were using it as the technology to build grids, but partly also to understand what the issues were. This was back in the summer of 2001. Condor was a mature piece of software; it's been developed for many, many years. SRB, for example, was not that mature, though. What that led to was very strong collaboration between teams in the U.K. and [teams in the U.S.] As I said earlier though, most of the applications in the UK had data issues to solve, which unfortunately this set of software didn’t address and so we invested in the development of data services for data access and integration – OGSA-DAI. Once you have people invested and have them doing real work on real systems--regardless of their discipline--how do you keep them invested? How do you give them the opportunity to expand the horizons? [Important] activity right now surrounds the concept of sustainability. We're working very closely with the group in the U.K. that funds networking and digital libraries, JISC, to embed this infrastructure in the universities…They're leveraging e-Science work to develop what they call virtual research environments. Essentially, what they are doing is raising the level, bringing to the existing environments the capabilities of e-Science technologies. You had talked about generic technologies. Are the virtual research environments an attempt to give those more specialized faces for disciplines? They're trying to meet the needs of particular groups. For instance, one of the VRE projects is working with archeologists to determine the things that they need to bring in databases, collect data, and compare them. Another is looking at what a VRE for the humanities would look like. At this stage, that's collecting requirements and building prototypes and getting input from the user communities. This goes all the way down to the most basic, generic resources. But the focus, of the VREs is on the higher levels, which often times means some kind of application or community portal. Sounds similar to what NCSA is doing with cyberenvironments, though we're trying to avoid focusing too much on that front end. Is the growth of the audience for these things--regardless of how deep they go down the stack toward an all-encompassing solution or of what you call them--changing what is expected of these environments? Yes, that’s right. We're trying to give people an environment to work in. If someone has developed various components for a particular computational environment, say MATLAB, that person should be able to use any grid resource and work through common tools. I’m sure we share the same vision – that of enabling the whole scholarly knowledge cycle, as we call it in the U.K. Being able to generate the data through simulation, curate it, have it linked to publications and used in further research. That whole cycle of data usage is important. We have to acknowledge that and capture it.