NCSA, GSLIS Receive $1.2 Million Mellon Foundation Grant

The Andrew W. Mellon Foundation has awarded $1.2 million to the National Center for Supercomputing Applications (NCSA) and the Graduate School of Library and Information Science (GSLIS) at the University of Illinois at Urbana-Champaign. The grant will support the development of an environment for drawing knowledge from humanities data. The project will address what principal investigator Michael Welge, leader of NCSA's Data Intensive Technologies and Applications Division, calls the "80 percent problem": 80 percent of the information needed for business and research is unstructured, meaning it's not in easily searchable databases (think of email, text documents, and even images, audio, and video); 80 percent of the required information is "open source," meaning it's not proprietary or top secret; and people are spending 80 percent of their time hunting for the information they need and just 20 percent actually using it. "There are trillions and trillions of bytes of data available, but the collections are dispersed and finding the relevant material is time consuming," Welge says. "Someone who wants to research 19th century novels or the work of Cervantes has a wealth of information available to them, but without tools to help them they'll spend a long time searching that haystack for their particular needle." The NCSA/GSLIS team will build on NCSA's successful D2K software -- which helps draw insight from structured data in a variety of research and business domains -- and IBM's Unstructured Information Management Architecture to develop a Software Environment for the Advancement of Scholarly Research (SEASR). SEASR (pronounced "Caesar") will provide the needed bridges from unstructured data, to structured data, to knowledge. The software will help scholars find the data they need, extract the most relevant information, and analyze what is found to generate fresh insights. "Leveraging the power of information technology for these processes will advance humanities research by increasing the quantity of evidence that researchers can explore and the variety of questions they are able to ask," says John Unsworth, GSLIS dean and a co-principal investigator for the SEASR project. "This project will have a broad impact on both the humanities and the social sciences because of the staggering growth in the amount of information that exists in a digital format," said Vernon Burton, director of the Illinois Center for Computing in Humanities, Arts, and Social Science. "It is of utmost importance to have automated tools for extracting useful knowledge from vast multi-modal datasets." SEASR's developers plan to make the software easy to use and modular, so that components created to address particular questions can be re-used by other researchers. "The SEASR team will accelerate the development of tools and algorithms for supporting humanities computing, allowing humanities scholars to be able to focus on their research," says co-principal investigator Loretta Auvil, NCSA. While the SEASR team will initially focus on the humanities, other disciplines in the sciences, engineering, and even national defense have similar needs to manage, analyze, and extract meaning from unstructured and structured data and future efforts could extend SEASR to serve other communities.