IBM supercharges management of massive amounts of data

: Written by: Writer; Category: GOVERNMENT; Published: March 3, 2009, 2:32 am

With New Policy-Based Automation, Latest Version of Parallel File System Coordinates Tiering of Information for Business or Scientific Use: IBM today announced plans to begin shipping a new version of its General Parallel File System (GPFS) software that acts like a search engine to identify and migrate files between different storage pools and feed high-speed business intelligence and scientific computers. Featuring new policy-based automation capabilities, GPFS is a mainstay of technical computing that is increasingly finding its way into the data center and enhancing financial analytics, retail operations and other commercial applications.

The process of managing data from where it is placed when it is created, to where it moves in the storage hierarchy based on management parameters, to where it is copied for disaster recovery or document retention to its eventual archival or deletion is often referred to as information lifecycle management, or ILM. GPFS tightly integrates the policy driven ILM functionality into the file system. Using file virtualization technology to analyze and identify data, this high-performance engine allows GPFS to support policy-based file operations on billions of files in hours instead of weeks. For instance, with the pre-release version of GPFS, IBM was able to scan one billion files in less than three hours in an internal performance benchmark. Further improving policy performance through parallelization techniques, the company is working to better those performance numbers in future tests. Recent enhancements to the latest edition of GPFS, Version 3.2, planned for availability October 5, have vastly accelerated the file identification process for managing tiered storage. Additionally, GPFS now supports pools of storage which can be comprised of tape, enabling the seamless maintenance of ever-growing storage infrastructures. As the performance of the world's leading supercomputers and business systems continues to increase, the need to manage the massive stores of data generated by these systems is also growing at unprecedented rates. More importantly, the combination of the scale of the computers combined with much larger data set sizes supported by parallel file systems allows new classes of questions to be answered with computer technology. Concurrent access at lightning speed GPFS provides concurrent access at lightning speed to multiple disk drives and storage devices, fulfilling a key requirement of powerful business intelligence and scientific computing applications that analyze vast quantities of often unstructured information, which may include video, audio, books, transactions, reports and presentations. "We are taking business intelligence to the next level through the analysis of this metadata," said Scott Handy, vice president of marketing and strategy, IBM Power Systems. "A 500-horsepower engine needs the right mix of fuel and air at the right time to operate at top speed. It's the same with large computer systems. When dealing with massive amounts of data to get deeper levels of business insight, systems need the right mix of data at the right time to operate at full speed. GPFS achieves high levels of performance by making it possible to read and write data in parallel, distributed across multiple disks or servers." Running on both the AIX, IBM's UNIX-based operating system, and Linux operating systems, the newest release of GPFS offers improvements in scalability and performance, simplified manageability, monitoring and availability. GPFS provides fast, reliable, and flexible access to structured and unstructured data. For example GPFS can support stunning access speeds of 130+GB/sec to a single file on a two-petabyte file system. It addresses the needs of the most demanding commercial customers by providing storage consolidation, file virtualization and simplified high performance policy-based file management. The latest version of GPFS includes a number of innovative features, such as:

Policy-driven automation, a flexible rule-based processing technique that allows matching the cost, performance or reliability of storage to the value of the data to improve overall performance. Alternately, this may also provide a cost savings to the customer by automatically and transparently moving data (without a path change) to less expensive storage when performance is not critical for that data.
Clustered network file system (NFS), a scalable, management feature that enables storage administrators to easily deploy and manage a clustered file-serving solution.

Handling data-intensive applications GPFS is designed to meet the needs of data-intensive applications -- such as risk management and other forms of financial analysis, data mining to determine customer buying behaviors across massive data sets, engineering design, digital media and entertainment, seismic data processing, weather modeling and scientific research -- by providing a single consolidated view of information across multiple systems. With GPFS for instance, financial services companies are using analytics grids to process financial data for fraud detection. Retailers are able to analyze daily transactions and determine discount policies, optimizing revenues and improving efficiencies. Customers have used GPFS to create a scalable NFS file-serving solution that is capable of supporting hundreds of NFS file servers and petabytes of storage within a single, highly reliable file system. GPFS continues to enable high-end technical and high performance computing by supporting multi-petabytes of storage and hundreds or thousands of nodes accessing a single file system. For more information on GPFS, please visit its Web site. GPFS Version 3.2 supports the IBM System p family, including the new POWER6-based IBM System p 570 server, and machines based on Intel or AMD processors such as an IBM System x family environment. Supported operating systems for GPFS Version 3.2 include AIX Version 5.3 and selected versions of Red Hat and SUSE Linux distributions.

GOVERNMENT

IBM supercharges management of massive amounts of data