CLOUD
Harnessing cloud computing for data-intensive research on oceans, galaxies
Private companies, universities and government agencies are joining forces to bring scientific research into the era of "cloud computing," the name for massive clusters of computers connected through the Internet.
The University of Washington has won three recent awards from the National Science Foundation related to cloud computing. Two of the grants will fund projects examining ocean climate simulations and analyzing astronomical images. Both provide tools so researchers can use cloud computing to easily interact with the massive datasets that are becoming more and more common in science.
A third grant to the UW provides curriculum and training to teach cloud computing.
The projects are funded through NSF's Cluster Exploratory program, which will access a cloud datacenter established for educational use in 2007 through a partnership between Google, IBM and six academic institutions, of which the UW was the first member. NSF joined the group last year.
Climate modelers are beginning to use computer simulations in more exploratory ways, said Bill Howe, a researcher at the UW's eScience Institute, a newly established group to support data-intensive research at the university. Instead of running a simulation to test a single hypothesis, climate scientists are now running long-term simulations and then sifting through tens of thousands of gigabytes of resulting data to discover trends.
"Using current tools, you can comfortably analyze and visualize datasets that fit in the computer underneath your desk," Howe said. "But you can't comfortably and interactively explore datasets at this new scale."
Howe's project aims to provide that interactivity for tens of thousands of gigabytes of simulation results. He created a tool, GridFields, to visualize the polygonal mesh of climate simulation output, and is now working to redesign GridFields to be efficient in a cloud computing environment. Collaborators at the University of Utah have an award under the same program to extend an accompanying system that makes it easier to write and keep track of computer programs.
"We need to get smart sooner rather than later on how to design and build a system that doesn't just live out on these machines at government or company data centers, but extends the cloud right down to your computer," Howe said.
Someday the tool should be easy enough that undergraduates and high-school students could sift through raw data themselves, he said.
A second grant will use cloud computing to study astronomical images. Astronomy has changed dramatically during the past decade, says Andrew Connolly, a UW associate professor of astronomy who was awarded the grant with UW research scientist Jeffrey Gardner. Scientists once competed for time on telescopes, recorded data and then studied the individual images in detail. Now telescopes continuously record high-resolution images that are available to all, providing millions of times more information.
"In the past I could have spent a couple of hours working on a single image. But now, if I have to multiply it by factors of many tens of thousands, that couple of hours each becomes something that's not feasible," Connolly said.
Companies such as Google, Microsoft, Amazon and Yahoo! have now created frameworks that make it easier to store and process information in the cloud and make the information available over the Web.
"We want to use these frameworks to enable science, and make it so that astronomers can come in and do the work that they need to do without needing to learn the intricacies of how to work with thousands of machines," Connolly said.
His grant will prepare astronomers to deal with data coming from telescopes scheduled to come online in coming years, such as the Large Synoptic Survey Telescope, of which the UW is a founding institution. The telescope's 27-foot mirror is connected to a 3.2 billion-pixel camera that takes pictures every 15 seconds. It is expected to record more than 30,000 gigabytes of data and detect more than 100 million astronomical sources every night.
"Cloud computing enables us to scale to the point where we can actually analyze that sort of data," Connolly said.
The third grant funded a 3-day workshop held in Seattle last July in which computer science professors learned from UW computer science and engineering faculty and students how to teach cloud computing skills.
"The rapid evolution of sensors is transforming all sciences from data-poor to data-rich," said Ed Lazowska, a UW professor of computer science and engineering who led the workshops. "The challenge is to use modern cloud computing resources, such as Amazon Web Services, and modern computer science advances, such as data mining and machine learning, to explore these massive volumes of data. This new computational science will be pervasive and will have enormous impact. UW is fortunate to be in on the ground floor."
The UW is the only institution to have won three awards through NSF's new data-intensive computing programs, and it has the largest total award value of nearly $700,000.