Cornell launches data-driven science unit

: Written by: Writer; Category: BIG DATA; Published: March 3, 2009, 3:19 am

Cornell University announced today the establishment of the DISCOVER Research Service Group (DRSG) to facilitate data-driven science at Cornell by developing cross-disciplinary data archival and discovery tools. DISCOVER will conduct pilot projects in selected strategic areas such as the development of data discovery portals using access-layer protocols now under development at Fedora Commons and the Virtual Observatory. "Addressing the magnitude of data being generated by today's large-scale research programs is essential," said Robert A. Burhman, Senior Vice Provost for Research at Cornell and the group's initial sponsor. "That’s why Cornell is expanding our investment in data-driven science by launching a new research service group that is dedicated to systematically tackling some of these data challenges." Cornell's Department of Astronomy and the University Library, in partnership with the Cornell Center for Advanced Computing, will work closely with DISCOVER, which is comprised of research groups from multiple disciplines and core data management and curation staff. "Assimilating data into knowledge is typically more challenging than acquiring the data," noted James M. Cordes, Professor of Astronomy, who is the DISCOVER Co-PI along with Janet A. McCue, Associate University Librarian. "Many research groups face the same or similar problems in handling their data, so a more collective and synergistic approach will avoid repetition, promote the adoption of best practices, and be more efficient and affordable," explained Cordes. “And, the Library is a natural partner in these efforts,” added McCue. “Many aspects of data curation—from discovery to preservation—fit comfortably with the Library’s mission.” The overarching goal of the DISCOVER Research Service Group is to provide accessible paths for the curation, preservation, and mining of scientific data. Systems are needed to make data sets accessible physically over both space (over a wide network) and time (for the indefinite future) and also transparently, using modern Web-based tools that are expected to evolve. “DISCOVER will identify the specifications and business model for curation and mining systems that are appropriate to generalization,” said David A. Lifka, Director of the Cornell Center for Advanced Computing, “and it will conduct bona fide research activities needed for the development of such systems.

BIG DATA

Cornell launches data-driven science unit