DRC to demonstrate Monte Carlo algorithm acceleration

DRC Computer Corporation, the leading provider of dynamically reconfigurable coprocessor modules (Sunnyvale, CA), and Impulse Accelerated Technologies, Inc. maker of the popular Impulse C optimizing C-to-FPGA compiler (Kirkland, WA) today announced the release of a highly integrated platform support package to compliment DRC’s RPU. The RPU is also used in the Cray XT5 Hybrid Computing Platform. DRC will be demonstrating Monte Carlo algorithm acceleration in their SC07 booth, for visitors involved in image processing, scientific, financial and other areas of high performance computing. SC07, the international conference for high performance computing, networking, storage and analysis, is being held November 10 to 16, 2007 in Reno, Nevada. Impulse C will also be featured on a schedule available at the Cray booth. The DRC RPU plugs into an open processor socket in a multi-way AMD Opteron system to provide direct access to adjacent double data-rate (DDR) memory. This provides direct access to DDR memory and any adjacent Opteron processor at full HyperTransport bandwidth [12.8 GBps] and ±75 nanosecond latency. The RPU then becomes a resource for the remaining Opteron processor for implementing application subroutines in hardware. The resulting speeds are typically 10x to 20x and, in some cases, 100x to 1000x faster than if the subroutines were run in software. The new integration extends the access to this architecture to more fully include software developers. Previously, a software developer would need to understand how to write the lower level (HDL) routines to be able to access the memory and I/O of the RPU. The new integration also creates a powerful methodology of co-design. In this new scheme, developers can write C code, and partition it between the Opteron and the FPGA. In this configuration, the processor can handle single stream, memory intensive processes and the FPGA can handle processes which contain internal pipelines which can be unrolled. Unrolling these loops creates a parallel processing approach which leverages the Xilinx FPGA at the core of the reconfigurable system. The FPGA as a highly parallel processor is more efficient for many types of high throughput computations such as the financial demonstration at the show. Collectively this new integration creates the capability for hardware engineers or software developers to create massively parallel algorithms directly in hardware, which can increase algorithm performance and throughput. These algorithms run at a fraction of the clock rate of standard processors, lowering power consumption. This is achieved with standard C programming tools plus Impulse C which creates parallel processes and automatically creates optimized hardware including parallelizing operations and pipelining critical loops. “We have been working closely with Impulse to find the most effective ways for software developers to increase application performance quickly and easily, and are very pleased to see the positive results of our strong partnership,” said Clay Marr, vice president of sales and marketing at DRC Computer. “Combined, our solution delivers near optimum acceleration in a simple and familiar development environment for the high performance computing market”. “Steady progress in tools and FPGA platform integration are lowering the barriers to FPGA-acceleration technology,” said David Pellerin, Impulse CTO and co-author of Practical FPGA Programming in C. “Improved access to reconfigurable computing is helping to convert financial, military and scientific design groups worldwide”. The example shown at the conference is a Financial Modeling Monte Carlo simulation for options valuation. It is partitioned between software and hardware such that the stock price calculations are accelerated in hardware. Accelerated random number generation can also take advantage of FPGA parallelism. The entire application was developed in C-language. The first phase of the example, moving the simulation to FPGA, took just a few days.