iRODS Version 2.0: Cyberinfrastructure unites collaborations

Federation Capability Supports Nuanced Data Sharing for Collaborative Research: The Data-Intensive Cyber Environments (DICE) group has announced the release of version 2.0 of iRODS, the Integrated Rule-Oriented Data System. The new version of the award-winning software adds a number of important features, including federation of independent iRODS installations which lets them “talk” to each other, supporting large-scale collaboration by giving users seamless access to data distributed across different iRODS systems. Core development of the open source iRODS data system is led by the Advanced Center for Data Intensive Cyber Environments at the Institute for Neural Computation at the University of California, San Diego and the National Center for Data Intensive Cyber Environments at the School of Information and Library Science at the University of North Carolina at Chapel Hill. Download of version 2.0, user information, and release notes are freely available as open source software from the iRODS wiki [irods.org]. “A major new feature in iRODS 2.0 is the ability to federate two or more independent iRODS data grids,” said Reagan Moore, director of the Data-Intensive Computing Environments group and professor in SILS at UNC. “Federation lets communities maintain independent iRODS installations, while choosing to share some or all of their data under explicit management policies.” iRODS does this by mapping the policies to computer-actionable rules that control all remote operations as well as data exchange between separate iRODS systems or Zones. Additional federation iRODS Rules are applied on top of the local Rules at each iRODS data grid. There will be an iRODS workshop February 2-5, 2009 that will bring together both users new to iRODS with others already using iRODS in a range of applications. Online registration is free and open through the January 10, 2009 deadline, more infomration at diceresearch.org. iRODS moves beyond the single-site repository model, which is based on the traditional hard copy paradigm, to implement a new paradigm that harnesses the full power of cyberinfrastructure and the virtual world to free digital data collections from the constraints of space -- whether physical, administrative, or disciplinary – and time, through long-term preservation. This approach gives users an adaptable and extensible system with the integrated capabilities required for the full range of digital data management applications, from highly customizable sharing in data grids, to publication of data in digital libraries, sensor stream aggregation for real-time data systems, and long term preservation of digital data for use in standard reference collections. New features in iRODS version 2.0 include: • iRODS Zone Federation. Each separate iRODS installation or “iRODS Zone” – which consists of one or more iRODS Servers, a single associated iRODS Metadata Catalog, and multiple Clients – can share data and metadata. • Master/Slave iCAT. An iRods Zone can be configured to run with a single Master iCAT metadata catalog plus optional Slave iCATs synchronized with the Master catalog. This can reduce latency, speeding up metadata queries across wide area networks. • iRODS Explorer for Windows client provides a rich Graphical User Interface and fast navigation and operations to manage data. • SRB to iRODS Migration Tool. This preliminary version of a migration tool helps convert an SRB instance to an iRODS one, letting the iRODS system access the data formerly under SRB management, without the need to move the physical files. • A new bundling feature gathers large numbers of small files into structured files such as tar files for efficient uploading, downloading, and archiving. “The iRODS 2.0 release contains many new features and improvements, large and small, based on user requests and our years of experience with iRODS and the SRB Storage Resource Broker,” said Senior Software Developer and Designer Wayne Schroeder. “In the aggregate these make iRODS a highly capable system that equips users to solve a wide variety of data management problems by making use of various subsets of the features.” iRODS supports seamless growth from small installations to the largest scales. At UCSD alone iRODS and the previous Storage Resource Broker (SRB) technology are already managing 1.2 petabytes of data and two hundred million files for 5,000 users, and growing. “We also understand that performance is a very important part of iRODS usability, especially at the larger scales, and in addition to the new federation capability this release also contains important performance enhancements,” said iRODS Software Architect Mike Wan. “We’ve added an efficient mechanism for transferring large files, a bundling mechanism for transferring a large number of small files, and a caching enhancement.” Other features of interest include the addition of a number of new micro-services; improvements in iRODS use of Grid Security Infrastructure (GSI), allowing regular iRODS users to authenticate with GSI; performance improvements in the iRODS FUSE user level file system capability; support for Rule-oriented Data Access to Oracle databases; a new data transfer mode for larger files, RBUDP (Reliable Blast UDP), in addition to the existing sequential (single TCP stream) and parallel (multi TCP streams) modes; and improvements to the iCAT iRODS Metadata Catalog, including rollback after errors to allow execution of subsequent SQL functions in PostgreSQL. iRODS 2.0 also includes improvements in testing and installation scripts. iRODS version 2.0 is supported on Linux, Solaris, Macintosh, and AIX platforms. The iRODS commands are also supported on the Windows operating system, and there is a Windows GUI client. The iRODS Metadata Catalog (iCAT) will run on both the open source PostgreSQL database (which can be installed as part of the iRODS install package) and Oracle. And iRODS is quick and easy to install -- just answer a few questions and the install package automatically sets up the system for you. iRODS was first released in late 2006. Version 1.0 of the software was released under a BSD open source license in January 2008. As a second generation data grid development effort, iRODS leverages more than 10 years of user-driven experience with the Storage Resource Broker (SRB). With a grant-funded core developer team, the iRODS system is growing rapidly as collaborating projects contribute code to the open source software. The iRODS team is working with partners in a number of projects to apply the technology, including the Transcontinental Persistent Archives Prototype (TPAP) for the National Archives and Records Administration (NARA), the Ocean Observatories Initiative (OOI), the NSF Temporal Dynamics of Learning Center (TDLC), the NHPRC-supported Distributed Custodial Archival Preservation Environments (DCAPE) project, the French National Library, and many others. Collaborators in the iRODS project include the French Institut National de Physique Nucléaire et de Physique des Particules (CC-IN2P3), the Sustaining Heritage Access through Multivalent ArchiviNg (SHAMAN) project, the UK e-Science Data Management Group at Rutherford Appleton Laboratory, and the High Energy Accelerator Research Organization, KEK, in Japan. In addition to Moore and Rajasekar, the DICE group includes software architect Mike Wan and senior developer Wayne Schroeder, along with Sheau-Yen Chen, Lucas Gilbert, Chien-Yi Hou, Antoine de Torcy, Paul Tooby, and Bing Zhu. SILS professor Richard Marciano leads the DICE Sustainable Archives and Library Technologies (SALT) lab at UNC.