SCIENCE
IU Data To Insight Center to lead Sloan-funded investigation into non-consumptive research
Indiana University's Data To Insight Center will lead a $600,000 grant from the Alfred P. Sloan Foundation to fund the first investigation of non-consumptive research for a major mass digitized collection of content. Partners with D2I on this include the HathiTrust Research Center (HTRC) and the University of Michigan's Department of Electrical Engineering and Computer Science.
"This funding will enable us to pursue a research track around non-consumptive research uses of the HathiTrust digital corpus," said principal investigator Beth Plale, professor in the IU Bloomington School of Informatics and Computing and director of the Data To Insight Center. "At the end of the project we expect to have cyberinfrastructure in place that successfully demonstrates that non-consumptive research can be carried out safely under the conditions of unintended malicious user algorithms."
Non-consumptive research involves computational analysis of one or more books without the researcher having the ability to reassemble the collection. Rather than reading the material, researchers use specialized algorithms to analyze text as a massive data set and the Sloan grant will help ensure the work can be conducted in a secure environment.
In some cases, HTRC would own the algorithms used by researchers, so HTRC needs to examine the security requirements for users, the algorithms and the data, all within the context of using the suite of algorithms available in the Software Environment for the Advancement of Scholarly Research (SEASR).
In other cases, the researcher would own and submit their own algorithms for use and the Sloan Foundation funding will be used to create what Plale called a "data capsule framework" prototype that would allow the scholar the freedom to experiment with new algorithms on a huge body of information, but with technological "trust but verify" mechanisms in place to confirm compliance with non-consumptive research policy.
Without taking into account the actual content of materials, researchers using their own complex algorithms might analyze such massive data sets for anything as simple as repetition of words to complex linguistic structures or the evolution of word usage over a range of time, space or even demographic class.
The HathiTrust repository contains almost 8.6 million digitized volumes, and about 2.2 million of those -- roughly 26 percent -- are in the public domain and currently available for non-consumptive research.
The model for implementing non-consumptive research is founded on a principle of trust but verify, where the researcher should generally be trusted to do the right thing and be given the freedoms to carry out creative research, but with mechanisms in place to ensure good behavior and adherence to rules. The security aspects of the project leverage research by Atul Prakash of University of Michigan, also a principal investigator on the project with Plale.
Leveraging cyberinfrastructure at Indiana University, including FutureGrid, and at the University of Illinois at Urbana-Champaign, the HTRC will provision a secure computational and data environment. "This collaborative cyberinfrastructure test-bed will serve as a proving ground for our research agenda around non-consumptive uses of the collection," said Robert H. McDonald, associate director in the IU Data to Insight Center and another principal investigator on the project.
"In defining new methods of non-consumptive research of the HathiTrust digital corpus, the HathiTrust Research Center and the IU Data to Insight Center are enabling research faculty and the HathiTrust partner libraries to engage in groundbreaking new research across the corpus while maintaining the security and integrity of the collection and the researcher's fair-use access to its content," said Brenda Johnson, Ruth Lilly Dean of Libraries at Indiana University.
For questions about the HathiTrust Research Center and its Non-Consumptive Research Agenda contact Beth Plale at 812-855-4373 .