BIG DATA
Award-Winning Digital Preservation Prototype Extended to West Virginia
- Written by: Writer
- Category: BIG DATA
The Transcontinental Persistent Archives Prototype (TPAP), a testbed for preserving electronic records collections from the National Archives and Records Administration (NARA) that must be maintained for "the life of the Republic," has announced the addition of a sixth partner site at the U.S. Navy's Allegany Ballistics Laboratory near Keyser, West Virginia. The TPAP project, whose sites nationwide are linked by data preservation technology developed by the San Diego Supercomputer Center at UC San Diego, is addressing key challenges in safeguarding, preserving, and providing access to authentic electronic records as the nation’s information becomes increasingly digital. Along with SDSC in San Diego, the six project sites include two NARA sites in or near the nation's capital, the University of Maryland, Georgia Tech, and the new site in West Virginia. A key aspect of the Transcontinental Persistent Archives Prototype is the collaborative nature of the research. “Extending this prototype to the Allegany Ballistics Laboratory in West Virginia applies advanced SDSC data preservation technology in association with the first deployment of the high performance, low latency, networking capabilities of the Department of Defense’s Defense Research and Engineering Network (“DREN”) in the state of West Virginia,” said Robert Chadduck, principal technologist for NARA’s Electronic Records Archives Program. “This materially advances the nation's window onto the electronic records archives of the future where shared knowledge can be managed and distributed across multiple institutions and platforms spanning the country.” “The capabilities being demonstrated in this extended testbed are essential to ensuring continuing access to electronic records that document our nation’s history, our democratic processes, the rights of American citizens and our national experience.” The TPAP project, built on the SDSC Storage Resource Broker (SRB) data grid system, received an Internet2 Driving Exemplary Applications (IDEA) Award in 2006 for enabling transformational progress in digital preservation research. The project’s results are expected to be a major contribution to the nation’s ability to sustain a “memory” in digital form. With digital data growing exponentially across all sectors of society, the powerful freedoms it offers are accompanied by an array of threats, from the creeping incompatibility of obsolete hardware and software to data corruption, viruses, hard drive crashes, and a lack of tools able to organize, manage, and access this avalanche of data. Today’s high end data collections are reaching petabyte size (one petabyte is one million gigabytes, the equivalent of 500 billion pages of printed text), and are expected to keep growing rapidly. Experts working on these challenges, from archivists and librarians to computer scientists, are urging stepped-up efforts to implement a preservation capability to maintain at-risk data, so that future generations will have the same access to information such as digital maps of the Iraq War as today’s historians have to maps of the Civil War. “The testbed uses SDSC’s Storage Resource Broker data grid software. To minimize the labor needed to maintain the preservation environment, we’re working on an upgrade of the system to the new open-source Integrated Rule-Oriented Data System (iRODS),” said Reagan Moore, Distinguished Scientist and director of SDSC’s Data Intensive Computing Environments (DICE) Division. “This will allow more complex and automated data management procedures which are required as the size and diversity of digital data collections continue their rapid growth.” The TPAP testbed, which already holds almost four terabytes (a terabyte is equivalent to 30,000 Encyclopedia Britannicas) of NARA federal government records in more than five million files, gains its archiving power from the “data virtualization” supported by the SDSC Storage Resource Broker technology. This data grid manages the properties of shared electronic records collections that may be distributed across multiple storage systems. The SRB also supports federation of the six independently administered sites, enabling the unification of the records so that they appear to users as a single virtual repository. This unified virtual environment enables archival staff to easily and flexibly add, manage, access, and replicate data from one site to another, ensuring flexible sharing and reliable access even if data is lost at one or more sites. The system also allows archivists to verify the authenticity and integrity of replicated data, which is essential for reliable long-term archiving. In another key demonstration, the prototype has been used to manage the evolution of storage technologies by migrating digital data to new hardware and software. In addition to adding a new site, the project extension also includes a research and education partnership between NARA and West Virginia University to study electronic records and promote civic awareness of electronic records as educational resources. The Transcontinental Persistent Archives Prototype is the product of an eight year research effort that includes the contributions of NARA’s Electronic Records Archives Program, the National Science Foundation’s Office of Cyberinfrastructure, SDSC, the University of Maryland, and Georgia Tech.