University at Buffalo, SUNY, Leads Data-Intensive Discovery Initiative Via an Academic, Government and Industry Collaboration

Consortium Leverages Largest Netezza High-Performance Appliance to Speed Scientific Discovery

Netezza Corporation today announced that the academic, government and industry consortium the Data Intensive Computing Initiative (Di2) has emerged and has entered its second phase by deploying Netezza’s largest appliance in open science. The system was deployed this past March 16 and was operational the next day, delivering its first query response against a 1.2 billion-row test database in two seconds, proving fast implementation and performance as a data-intensive computing platform.

“The University at Buffalo is excited to be leading this collaborative effort,” said Bruce A. Holm, Ph.D., senior vice-provost and executive director of the New York State Center of Excellence for Bioinformatics and Life Sciences (COE). “The University’s commitment to knowledge discovery through computational methods is longstanding. Virtually all science and engineering domains are facing data tsunamis. The latest micro-array, sensor technologies and computational simulations are creating data sets at an unprecedented scale. Exploiting the data-intensive computing capabilities of the Netezza platform with our Di2 partners will help shorten the time to discovery in the science and engineering domains.”

“The Di2 has been a key partner in extending the Netezza data warehouse appliance into the high-performance, data-intensive computing realm,” said Jim Baum, president and CEO of Netezza. “As founding members of the Netezza Developer Network (NDN), the Di2 has helped extend our capabilities in the science and engineering domains. We look forward to continuing to support their efforts to increase the pace of data-intensive discovery.”

“During our testing we performed analytics of micro-array data similar to the real discovery problems the biological sciences are facing. We obtained more than two orders of magnitude performance speedup as compared to traditional high-performance computing (HPC) clusters,” said Vipin Chaudhary, director of the Di2 and associate professor of Computer Science and Engineering in the School of Engineering and Applied Sciences. “The mapping of our algorithm onto Netezza took one day to achieve that performance while the same effort took weeks on the HPC cluster. Data-intensive science and engineering requires multiple HPC architectures and platforms,” Dr. Chaudhary continued. “The Netezza system is a massively parallel device that combines a storage device, FPGA, memory and CPU on each of more than one hundred blades in a standard cabinet. The architecture delivers the query to the data and executes it in parallel across the hundreds of blades simultaneously. The result is incredibly fast response times against data sets containing many terabytes of data.”

“We can now address at scale many of the data-intensive discovery challenges that science is facing,” said Todd C. Scofield, founder and co-director of the Di2, and managing director of Big Data Fast LLC. “We originally focused our efforts on data-intensive disease discovery, but the common discovery needs across the science and engineering domains led us to expand our focus. Many of the algorithms and mathematics across the domains are the same or similar.”

“The Netezza system will enhance our disease and drug discovery research efforts,” said Murali Ramanathan, a leading Multiple Sclerosis and pharmacogenetics researcher, and associate professor of Pharmaceutical Sciences and Neurology at SUNY at Buffalo. “The cause of many diseases, such as multiple sclerosis and cancer, is a complex problem with many genes and environmental factors involved. These are significant combinatorial problems that need to be solved and require the analysis of data sets of many terabytes. We developed algorithms to help understand some of these gene and environment phenomena. Collaborating with Dr. Chaudhary and Netezza engineers, we are now mapping these algorithms onto the Netezza platform. Once we understand the causes of the diseases,” added Dr. Ramanathan, who is also a member of the Center for Protein Therapeutics and the Jacobs Neurological Institute, “we can identify better drug targets that may lead to better treatments for these devastating diseases.”

“Our simulations of combustors in gas turbine engines are run on some of the largest supercomputing assets available at the Department of Defense and NASA facilities,” said Suresh Menon, professor of Aerospace Engineering, and director of the Computational Combustion Laboratory at the Georgia Institute of Technology. “A single simulation can run for days or weeks, and produce output files (in three-dimensional space and time – and hence, often called 4D datasets) in the many tens of terabytes. Having the ability to find, extract and analyze features of importance in these massive (and multiple) 4D datasets will allow us to understand the complex interactions that are taking place. With this kind of knowledge we can introduce computational predictions and analysis into the design cycle and thereby, cut down the development time/cost for the next generation power and transportation systems.”

"A key element of the Di2 is its industry collaboration with NDN partners and the end users of data-intensive applications,” said Michael Upchurch, COO of Fuzzy Logix, a Di2 and NDN partner. “Our pattern matching and predictive analytics algorithms are historically used by the consumer goods, financial services and web analytics markets. Working with the Di2 we have identified new science and engineering markets for our high performance library of over one hundred algorithms. The Di2 provides access to domain experts who understand the 'language' and discovery space of specific science and engineering applications. We are now involved with the Di2’s teams for data-intensive aerospace, genomic and healthcare applications.”