NVIDIA Announces CUDA Toolkit 3.2

NVIDIA announced the availability of the CUDA Toolkit 3.2 production release, which provides significant performance increases, new math libraries and advanced cluster management features for developers creating next-generation GPU-accelerated applications.

The CUDA Toolkit includes all the tools, libraries and documentation developers need to build CUDA C/C++ applications, and is the foundation for many other GPU computing language solutions. New features and significant performance enhancements in version 3.2 include:

  • Up to 300-percent performance improvement in CUDA BLAS (CUBLAS) library routines, delivering eight times faster performance than the latest Intel MKL (Math Kernel Library).
  • CUDA FFT (CUFFT) library optimizations delivering 2 - 20 times faster performance than the latest MKL.
  • New CURAND library for random number generation at 10-20 times faster than the latest MKL.
  • New CUSPARSE library of sparse matrix routines that delivers 6-30 times faster performance than the latest MKL.
  • A host of additional improvements to GPU debugging and performance analysis tools.

In addition, the new CUDA Toolkit 3.2 release includes H.264 encode/decode, new Tesla Compute Cluster (TCC) integration, cluster management features, and support for the new 6GB NVIDIA Tesla and Quadro GPU products.

NVIDIA is hosting a webinar on Tuesday, Nov. 23 at 10:00 a.m. PT to review the new performance enhancements and capabilities of the new CUDA Toolkit.