Supercomputing Advances Open Seismic Data Processing Capabilities

Advancements in supercomputing technology are enabling new capabilities in seismic data processing and interpretation. Following Intel cofounder Gordon Moore’s famous 1965 prediction that the number of transistors placed on a computer chip would double every 18 months, ever-greater computational power and speed at ever-lower cost is changing the game in seismic data processing by allowing highly compute-intensive algorithms to run in a manner that is both time- and cost-efficient. In many cases, these algorithms are not new but were impractical or impossible to run on conventional supercomputers. The availability of powerful and fast supercomputing solutions at acceptable pricing points is changing everything in seismic data processing. For example, the performance-to-price ratio of technical computing has improved so dramatically that 3-D prestack depth migration algorithms are quickly becoming standard tools in the processing industry. Only a few years ago, prestack depth migration was used only very selectively because the computational requirements to run the algorithms made them cost prohibitive for all but a handful of projects. However, with the advent of low-cost, yet powerful cluster systems running on the Linux operating system, prestack depth migration can now be run relatively quickly and affordably. Incremental gains in computing performance and decreasing costs per compute cycle are also allowing sophisticated 3-D migrations in the time domain. As with prestack depth migration, the basic reverse time migration calculations needed to crack the code of difficult geologic structures have been available for some time. The problem has been focusing enough processing power to run them economically. However, the ability to cluster flexible arrays of computing nodes into powerful distributed architecture supercomputers allows off-the-shelf processors to be efficiently scaled up to handle demanding data processing tasks such as proprietary reverse-time migration algorithms. Reverse-time migration gives operators a new tool for tackling imaging problems related to geologic complexities such as steeply dipping or overturned structures, large velocity contrasts, and seismic energy illumination below and near salt. As a form of full two-way wave migration, reverse time migration correlates two separate wave fields to derive more robust images and more accurate velocity models. The technique is designed to deliver a superior final image compared to conventional depth migrations, which take mathematical shortcuts to minimize the computational requirements. The examples shown in Figures 1A and 1B compare one-way standard wave equation migration versus two-way reverse time migration of a steeply dipping overturned reflector with a very large velocity contrast similar to Gulf of Mexico salt environments. FIGURE 1A - Steep Dip One-Way Wave Equation Migration
FIGURE 1B - Steep Dip Two-Way Reverse Time Migration
As with prestack depth migration, reverse time migration algorithms can be applied to existing seismic volumes to achieve better subsurface images and extract new insights from legacy data. Instead of reacquiring seismic data in a different direction to solve a problem such as seismic illumination deficiencies, for example, operators may be able to apply reverse time migration to obtain an accurate subsurface image at a fraction of the cost of acquiring even a small volume of new data. The initial applications of reverse time migration are in the Gulf of Mexico, where the technology is being used primarily to image structural features below complex salt formations. Subsalt has long challenged traditional seismic imaging techniques, causing distortions and lost data zones in final processed images. In these kinds of applications, full two-way wave equation reverse time migration can provide imaging capabilities geophysicists could only dream about five years ago. Reverse time migration’s time has finally come thanks to high-performance Linux clusters. Clustering multiple commodity- priced central processing units into scalable supercomputer arrays allows massive, complex processing tasks to be performed at new efficiency paradigms, enabling a technology like full two-way wave equation reverse time migration to be commercially viable. The combination of reverse-time migration and Linux clusters provides the most advanced tools for solving complex imaging problems. And not only are clusters bringing unprecedented power-versus-price performance to data processing, but the architecture of Linux clusters also allow a number of subtasks, or threads, to be run simultaneously in parallel, further enhancing speed and efficiency. In the world of seismic data processing, the degree to which Linux clustering is impacting the economics of running advanced algorithms on large data volumes is nothing short of revolutionary. As dependable as Moore’s Law has proven in computing technology, another unwritten rule has been just as reliable in the geophysics industry: the demands of seismic data processing ultimately always exceed the practical limits of available computing technology. That may always be the case to some degree since the amount of information contained in each unit of data and the amount of raw data captured per survey both also continue to grow exponentially, Linux clusters are focusing huge amounts of computing power on seismic data processing jobs at pricing points never before attainable. In the past, limitations in the amount of computing horsepower that could realistically be dedicated to a given processing job meant that some particularly compute cycle-hungry algorithms–while theoretically possible–were impossible to actually implement. In fact, some algorithms would have required decades to run using the biggest, most expensive massively parallel machines available only a decade ago. Even for those algorithms that could be run, computing constraints often forced compromises in processing workflows in order to deliver results within a manageable time frame and at an acceptable cost. With clusters, every incremental gain in individual microchip power and speed is magnified many times over. Every time chip capacity doubles, the same number of processors can deliver twice the performance for only an incremental cost difference. A big part of the beauty of clustering is that once a system is in place, it can be upgraded whenever a new commodity chip is introduced to increase capacity and speed systemwide. Another huge advantage is scalability. If more computing resources are needed, additional processors can simply be added to an existing cluster. In fact, ondemand services allow users to tap into high-performance resources when and where they need them on an outsourced basis. On-demand clusters are integrated with everything the user needs to get the job done, including 32- or 64-bit processors running either Linux or Windows, management and storage nodes, high-capacity disk storage, high-speed interconnection options, virtual private networks and firewalls. These services are particularly appropriate for smaller companies that need to deploy extra capacity to meet computational requirements with minimum financial and technical risk. Users can contract for on-demand clusters as needed to meet short-term or peak workloads, and pay only for the amount of capacity reserved for the duration of the contract period. Open sourcing is also a distinct driver behind the adoption of Linux clusters. Open sourcing gives users the opportunity to make strategic choices and avoid becoming “locked in” by investments in hardware and software running proprietary UNIX systems. More choices give users more control, more capabilities and more flexibility to use whatever solutions are most appropriate. And Linux is a free, public-domain operating system, which means users can copy and modify the base source code as needed. Commercial versions are available from various developers with full support, but numerous Internet sites also offer modifiable base code downloads at no charge. Given the advantages, it is no surprise that sales of Linux servers increased by 60 percent between 2003 and 2004, with Linux servers continuing to capture a rapidly expanding share of worldwide market. In fact, according to computer hardware analysts, the market for clustered servers has effectively doubled over the past few years. Looking forward, the movement toward clustered Linux computing is ultimately leading to scalable, distributed networks on an even grander scale by linking multiple clusters into one highperformance grid. Grid computing allows different computing systems to share applications and data, harnessing the power of an almost endless number of processors at geographically dispersed locations–such as field offices or data centers–to allow complex processing applications to run much faster. Standardized software enables grid computing by analyzing the resources available across a network, then dividing an application and parceling it out to available clusters/servers on the grid. One of the key drivers behind grid computing is the ability to fully rationalize information technology infrastructure. Oil and gas companies make substantial investments in IT hardware, and grid computing allows them to fully leverage those investments while maximizing the efficiency of their computing resources. Already, high-performance computing solutions are being introduced that are optimized for use in both cluster and grid computing. The newest servers integrate high-capacity InfiniBand (an open industry, multiprotocol standard that offers high-speed interconnect and input/output solutions) and gigabyte Ethernet switch technologies to deliver high availability and high performance for cluster and grid supercomputing. With highdensity processors, the servers can be scaled up or scaled out with minimal space requirements using existing infrastructure. Together with the latest cluster management software, the technology creates powerful, highly available computing resources at very low cost. Instead of proprietary interconnects to manage cluster traffic, high-speed Infini-Band or gigabit Ethernet interconnects between the various nodes and shared storage provide a more efficient way to connect high-density clusters, making the technology ideal for compute-intensive processing or database applications, be it high-performance cluster or grid computing. The InfiniBand switch interfaces with all external data and storage networks, providing efficiency to large-scale deployments and a tremendous amount of processor flexibility. In addition, the integrated modular design results in an effective, managed infrastructure that centralizes power, storage and network connections while simplifying deployment and maintenance. The latest server technology also incorporates dual-core capability to allow multiple applications with different memory requirements to be run in parallel. Multicore design is the next big advance in processor technology, enabling multiple processing elements within the same processor silicon and chip. Multicore technology introduces new levels of processing density and allows concurrent multiple processors in a single system to achieve new levels of cluster performance. Especially when it comes to premium algorithms like reverse time migration, the compelling economics of advanced Linux clusters is changing the technical and economical landscape of seismic data processing. Eventually, the same technologies sparking the ongoing revolution in cluster computing in seismic processing will power computing grids for mission-critical business applications. For high-performance computing sectors like geophysics, leveraging lowcost, off-the-shelf component servers and storage systems to link diverse systems into a single virtual grid that can manage computing cycles across an entire enterprise appears to be a highly compelling proposition. Moore’s Law has held true for more than 40 years now and that will likely continue to be the case for the foreseeable future. Who knows what kind of imaging technologies will evolve from incrementally faster, more powerful and more affordable computing capabilities, but in five years’ time, the industry could be routinely running full-form elastic wave equations or other next-generation solutions. Regardless, seismic data processing has clearly come a long way since processing companies had to rent time on conventional massively-parallel supercomputers because purchasing a dedicated machine was too cost prohibitive. To contextualize the power concentrated in a single high-density, high-performance cluster, consider that one 3,500-node system with dual-processor 64-bit architecture holds enough computational capacity to house double the volume of the entire contents of the Library of Congress and still have enough remaining memory to save all the audio data seven people will hear in their entire lifetimes, and all at a fraction of the cost of a conventional massively-parallel machine. That is the kind of progress that Gordon Moore can be proud of. SC Online would like to thank DANIEL KIM and JIM HAIG for contributing to this article. DANIEL KIM is president, chief executive officer and chief technology officer of Milpitas, Ca.-based Appro International Inc., a developer of servers, storage subsystems and workstations for the high-performance and enterprise computing markets. He founded the company in 1991. Kim holds an M.B.A. from the University of Missouri-Kansas City and a degree in business administration from SoGang University in Korea. JIM HAIG is chief executive officer of Staag Imaging LP in Houston, which specializes in complex imaging of marine data. He has 28 years experience in the oil and gas exploration industry. Haig holds a degree in civil engineering from Purdue University.