Google, IBM announce initiative to address internet-scale computing challenges

Google and IBM today announced an initiative to promote new software development methods which will help students and researchers address the challenges of internet-scale applications in the future. The goal of this initiative is to improve computer science students’ knowledge of highly parallel computing practices to better address the emerging paradigm of large-scale distributed computing. IBM and Google are teaming up to provide hardware, software and services to augment university curricula and expand research horizons. With their combined resources, the companies hope to lower the financial and logistical barriers for the academic community to explore this emerging model of computing. The University of Washington was the first to join the initiative. A small number of universities will also pilot the program, including Carnegie-Mellon University, Massachusetts Institute of Technology, Stanford University, the University of California at Berkeley and the University of Maryland. In the future, the program will be expanded to include additional researchers, educators and scientists. "Google is excited to partner with IBM to provide resources which will better equip students and researchers to address today’s developing computational challenges," said Eric Schmidt, CEO of Google. "In order to most effectively serve the long-term interests of our users, it is imperative that students are adequately equipped to harness the potential of modern computing systems and for researchers to be able to innovate ways to address emerging problems." Fundamental changes in computer architecture and increases in network capacity are encouraging software developers to take new approaches to computer-science problem solving. For web software such as search, social networking and mobile commerce to run quickly, computational tasks often need to be broken into hundreds or thousands of smaller pieces to run across many servers simultaneously. Parallel programming techniques are also used for complex scientific analysis such as gene sequencing and climate modeling. "This project combines IBM’s historic strengths in scientific, business and secure-transaction computing with Google’s complementary expertise in Web computing and massively scaled clusters," said Samuel J. Palmisano, chairman, president and chief executive officer, IBM. "We’re aiming to train tomorrow’s programmers to write software that can support a tidal wave of global Web growth and trillions of secure transactions every day." For this project, the two companies have dedicated a large cluster of several hundred computers (a combination of Google machines and IBM BladeCenter and System x servers) that is planned to grow to more than 1,600 processors. Students will access the cluster via the Internet to test their parallel programming course projects. The servers will run open source software including the Linux operating system, XEN systems virtualization and Apache’s Hadoop project, an open source implementation of Google’s published computing infrastructure, specifically MapReduce and the Google File System (GFS). At the University of Washington, students were able to harness the power of distributed computing to produce complicated programs such as software that scans voluminous Wikipedia edits to identify spam and organizes global news articles by geographic location. "In 2006, when I helped Christophe Bisciglia, a former UW student now a senior engineer at Google, to develop the program, our goal was to understand the challenges that universities face in teaching important new concepts such as large scale computing and develop methods to address this issue," said Ed Lazowska, Bill & Melinda Gates Chair of Computer Science & Engineering at the University of Washington. "A year later, we’ve seen how our students have mastered many of the techniques that are critical for large scale-internet computing, benefiting our department and students." "Carnegie Mellon applauds Google and IBM for helping to provide the resources that will help professors better prepare our students for the challenges presented by highly parallel computing," said Randal Bryant, Dean of the School of Computer Science at Carnegie Mellon University. "We are quite pleased to be among the first universities participating in this program this fall." To simplify the development of massively parallel programs Google and IBM have created the following resources:
  • A cluster of processors running an open source implementation of Google’s published computing infrastructure (MapReduce and GFS from Apache’s Hadoop project)
  • A Creative Commons licensed university curriculum developed by Google and the University of Washington focusing on massively parallel computing techniques available at: its Web site
  • Open source software designed by IBM to help students develop programs for clusters running Hadoop.
  • The software works with Eclipse, an open source development platform. The plugin is currently available at: http://lucene.apache.org/hadoop/
  • Management, monitoring and dynamic resource provisioning of the cluster by IBM using IBM Tivoli systems management software
  • A website to encourage collaboration among universities in the program. This will be built on Web 2.0 technologies from IBM’s Innovation Factory.