PetaShare: Enabling Collaborative Research

NSF recently funded LSU $1Million for the development of PetaShare, which is seen as "a system might become an important testbed for future Grids, and a leading site in next generation Peta-scale research". -- The unbounded increase in the size of data generated by scientific applications necessitates collaboration and sharing among the nation’s education and research institutions. Simply purchasing high-capacity, high-performance storage systems and adding them to the existing infrastructure of the collaborating institutions does not solve the underlying and highly challenging data handling problem. Scientists are compelled to spend a great deal of time and energy on solving basic data-handling issues, such as the physical location of data, how to access it, and/or how to move it to visualization and/or compute resources for further analysis. Assistant Professor Tevfik Kosar and his team aims to develop an innovative distributed data archival, analysis and visualization cyberinfrastructure for data intensive collaborative research, which they call PetaShare. PetaShare will enable transparent handling of underlying data sharing, archival, and retrieval mechanisms, and will make data available to scientists for analysis and visualization on demand. PetaShare will enable scientists to focus on their primary research problem, assured that the underlying infrastructure will manage the low-level data handling issues. During the development of Petashare, Dr. Kosar and his team is planning to employ a very novel approach to solve the distributed data sharing and management problem. Unlike existing approaches, PetaShare will treat data storage resources and the tasks related to data access as first class entities just like computational resources and compute tasks, and not simply the side effect of computation. The key technologies that will be developed in this project will include data-aware storage systems and data-aware schedulers, which take the responsibility of managing data resources and scheduling data tasks from the user and perform these tasks transparently. More than twenty-five senior researchers from five Louisiana institutions, with research areas spanning ten different disciplines, are actively involved in this project. The PetaShare development team involves researchers with profound expertise in distributed data handling and storage, grid computing, high performance data mining, and visualization. PetaShare aims to bring together development and application groups from different institutions and different disciplines, thereby enabling them to share information and experience, and ultimately create a next generation instrumentation which will accelerate and enhance their research. The CS faculty involved in PetaShare project include Tevfik Kosar, Gabrielle Allen, Ed Seidel, S. S. Iyengar, Brygg Ullmer, Bijaya Karki, and Evangelos Triantaphyllou. Among non-CS collaborators are Robert Twilley from Oceanography and Coastal Studies, William Wischusen from Biological Sciences, and several others including researchers from University of Louisiana at Lafayette, Louisiana Tech, Tulane, and University of New Orleans. Dr. Kosar and his team intend to make this emerging PetaShare technology available to all scientists and engineers who deal with large amounts of distributed data. One of the major goals of this project is to make this instrument a generic solution to the data handling problem that scientists are facing in collaborative research. Several companies in the storage industry have already indicated that such a technology, if developed properly, would have strong potential for commercialization in the future. PetaShare is a $1Million project, and recently recommended for funding by National Science Foundation. For more information on PetaShare, you can contact Tevfik Kosar at kosar@lsu.edu