Grant to Support Text-Mining Research

The Andrew W. Mellon Foundation has granted nearly $600,000 over two years to a multi-institutional project directed by John Unsworth, dean of the Graduate School of Library and Information Science at the University of Illinois, Urbana-Champaign. The project builds on the D2K (Data to Knowledge) software developed by Michael Welge's Automated Learning Group at the National Center for Supercomputing Applications, and it will include partners at humanities research computing at the University of Georgia the University of Maryland, and the University of Virginia. The project will produce software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries at Tufts University, the University of Illinois, the University of Virginia, Indiana University, the University of North Carolina, and other institutions. "In search-and-retrieval," Unsworth says, "we pose specific queries and get back answers to those queries; by contrast, the goal of data-mining is to produce new knowledge by exposing unanticipated patterns. Over the last decade, many millions of dollars have been invested in creating digital library collections: the software tools we'll produce in this project will make those collections significantly more useful for research and teaching." Professor Stephen Ramsay, of the University of Georgia's English Department, agrees: "literary criticism and data mining share an important common ground: both are concerned with the isolation of patterns in data. Students of literature are often trying to detect patterns of change in the language or structure of literary works. Sometimes, this search for pattern is ordered toward the demonstration of some interpretive insight, but this order is just as often reversed—we notice patterns in texts and those patterns inspire interpretive insight." Professor Matthew Kirschenbaum, faculty member in the University of Maryland's English department and Fellow at the Maryland Institute for Technology in the Humanities (MITH), says that "information visualization will be the essential scholarly genre of the 21st century. It is already commonplace in astronomy, biology, chemistry, economics, engineering, environmental sciences and geology, geography, meteorology, physics, and mathematics. The basic intellectual and imaginative leap for information visualization in the humanities will be the leap from documentary to algorithmic forms of evidence. At the same time, we must understand the 'iconology' of these visual displays, their roots in long-standing traditions of image-making, cognitive design, and knowledge representation." Professor Martha Nell Smith, Director of MITH, observes that "the cross-institutional collaboration in this initiative will help ensure that we build tools that are widely usable, that are standards-based, and that will advance the production and preservation of digital scholarship in the humanities, in all its diversity." Professor Bernard Frischer, Director of the University of Virginia's Institute for Advanced Technology in the Humanities (IATH) points out that "digital scholarship in the humanities requires extensive multimedia collections, and it seeks to explore and document the complex relationships among items in such collections. This, in turn, requires a close collaboration between humanists and computing specialists." Tom Horton, of the University of Virginia's Computer Science Department, will oversee a distributed software development process for this project. He notes that "developing successful software tools to work effectively in such complex situations is always a challenge, so we'll follow principles of user-centered software design in order to create data mining and visualization tools that will give scholars what they need to be effective, efficient and creative as they work with digital library materials." The Andrew W. Mellon Foundation is a private foundation, with assets of approximately $4 billion, which makes grants on a selective basis to institutions in higher education; museums and art conservation; performing arts; population; conservation and the environment; and public affairs. Information about the Foundation is available on its website, http://www.mellon.org/. The Mellon Foundation provided a $56,000 planning grant for this project, in 2003.