SDSC releases Matrix version 2.0 middleware for Web services interoperability

The San Diego Supercomputer Center (SDSC) at UC San Diego has released version 2.0 of the new Matrix middleware, which allows applications and services based on standards such as the Web Services Description Language (WSDL), Simple Object Access Protocol (SOAP), and World Wide Web Consortium (W3C) XQuery, to communicate with data and other resources in Grid environments. More information and the Matrix download are available at http://www.npaci.edu/DICE/SRB/matrix/. "Matrix is valuable because it provides another way for people to use the SDSC Storage Resource Broker (SRB), enabling WSDL based applications to communicate with the SRB more easily," said Arcot Rajasekar, director of the Data Grids Technologies group in SDSC's Data and Knowledge Systems (DAKS) program. "This extends the proven integration, speed, and robustness of the SRB, providing greater interoperability in Web services and the Grid." Matrix may be viewed as a "wrapper" or layer on top of the SRB, SDSC's powerful and popular data management middleware tool. Matrix provides coordinated execution of process-flow pipelines in Grid environments based on the use of a Data Grid Language (DGL), which functions for data grids like the Structured Query Language (SQL) does for databases. In this way, SDSC Matrix can create and manage process flow pipelines, providing dynamic control of SRB and other services and facilitating scientific computing processes. Scientific computing requires integrating data from various locations with computing and other resources from different locations into complex workflows. Such distributed Grid workflows are subject to uncertainties, dynamic changes in constraints, and failures, and the Data Grid Language allows Matrix to describe these processes as pipelines, giving greater capability to respond to dynamic Grid environments. "You can look at a scientific computing process as an assembly line," said Arun Jagatheesan, lead developer of Matrix in the Data Grids Technologies group in SDSC's Data and Knowledge Systems (DAKS) program. "But it's not a fixed assembly line, it must adapt during the process." For example, a scientific problem may require taking 1,000 files as input data, splitting them into multiple parallel flows, sending them for execution to several different sites, and adjusting the whole process dynamically depending on the results of intermediate steps. Matrix can coordinate data flows from sensors to analysis pipelines, digital libraries, and persistent archives, and will be particularly valuable in large distributed data-intensive environments. Matrix is written in Java for portability and supported on all platforms with Java runtime, including AIX and other UNIX and Linux platforms, Windows, and others. In the future, the Matrix team plans to extend the middleware to provide uniform access to other services including those based on Open Grid Services Architecture (OGSA) such as GridFTP, logic rules that control execution, and visualization services. The Matrix team, managed by Reagan Moore, co-director of the Data and Knowledge Systems program at SDSC and led by Arun Jagatheesan, includes Allen Ding, Reena Mathew, and Lucas Gilbert. The development is supported by a number of projects including the National Partnership for Advanced Computational Infrastructure (NPACI), the NIH Biomedical Informatics Research Network (BIRN), the NSF Grid Physics Network (GriPhyN), and the NSF Southern California Earthquake Center (SCEC). -- Paul Tooby