BIG DATA
Indiana develops Komadu
The Indiana University Data to Insight Center (D2I) has released a new suite of software tools, Komadu, designed to help researchers track and verify digital data, a crucial step in computational research.
As today’s researchers deal with ever-expanding data sets and share them with colleagues around the world, it’s increasingly important for that data to have a documented history, proving its validity and quality. Called "data provenance," this history reveals the origins of each data object, as well as processes applied by various research teams. Good data provenance can have a transformational impact on scientific discovery.
"The Komadu tools are made for capturing, representing, and using data provenance, which tells us where a piece of digital data came from, particularly digital data that has undergone transformation by software algorithms," said Beth Plale, director of the Data to Insight Center and managing director of IU’s Pervasive Technology Institute. "Who carried out a transformation on a piece of data, why, and when are all critical bits of information to someone interested in using the data in a different setting. Data provenance, for instance, can expose errors that crop up when one day’s run of an image processing pipeline differs from another day because of a missing file."
The Data to Insight Center has been leading the data provenance charge for nearly a decade. In 2005, D2I researchers published one of the first papers on provenance, which helped to define the field. Shortly after, D2I researchers developed Karma, a data provenance tool. Karma's experimental uses included studying the provenance of data in the computer networks of Global Environment for Network Innovations (GENI) and in ice sheet data captured by a NASA polar-orbiting satellite.
As Karma’s successor, Komadu now complies with the W3C PROV (World Wide Web Consortium provenance) specification that guides online exchange of provenance data. Plale says that doing so positions it well to play a contributing role in Linked Data activity, which makes diverse data resources more easily available.
The Data to Insight Center was formed in 2009 through partial funding from the Lilly Endowment to IU’s Pervasive Technology Institute. From the beginning, one of the center’s prime goals has been to couple core data management research with “translational research” that directly results in benefits for people and society.
"Komadu is the most recent product to emerge from the Data to Insight Center’s translational research trust," said Plale. "These tools are a tangible outcome of Lilly’s investment in Indiana University, and one that my colleagues and I are certain will have a huge impact on big data research in a range of fields – bettering the lives of people around the world."
Komadu is available under an Apache license from github: https://github.com/