Netezza TwinFin Appliance Used for Data-Intensive Computing Applications

Data Warehouse Appliance Leader Supplies Tools for National Laboratory Research

 

 

Netezza Corporation will supply its Netezza TwinFin appliance to be used in the U.S. Dept of Energy’s (DOE) Pacific Northwest National Laboratory (PNNL)'s high performance computing toolbox for its data intensive supercomputing and graph analytics research. PNNL is one of DOE’s ten national laboratories, managed by DOE's Office of Science, and operated by Battelle since 1965. PNNL also performs research for other DOE offices as well as government agencies, universities and industries to deliver breakthrough science and technology to meet today's key national priorities.

“Netezza is proud to count PNNL as a new customer within our federal practice,” said Jeff Kidwell, VP of federal government for Netezza. “For decades the DOE national laboratories have produced some of the most prolific scientific and technological advances in U.S. history, and consistently produce innovation at astonishing levels. For their scientists to determine after careful review that the Netezza technology can help them deliver solutions to some of the most pressing issues we face as a country is the ultimate compliment.”

The fastest computers these days are running at petascale speed—capable of performing a million billion calculations per second. But when you want to comb through a lot of data rather than perform a lot of calculations, you need different computer hardware and software. At Supercomputing 2009, computational scientists from PNNL and engineers from Netezza showed how they tackled petascale amounts of Internet traffic data to detect patterns that could indicate cyber attacks from the outside.

The data-intensive supercomputing group from PNNL used data warehouse appliances built by Netezza and data collected at the boundary between a closed network and the outside world to determine whether a massively large set of data could be analyzed efficiently. The so-called perimeter data represents the entry or exit of e-mails, Internet searches, data transfers and other connections at thousands of ports of entry to a particular internal network. A common type of cybersecurity attack starts at these ports, with the attacker scanning the port to determine if it is vulnerable to intrusion.

"If you see the port scans, then you know targeted activity is coming next," said PNNL's senior scientist John Johnson, who's leading the work. "But with such large data sets, the lens we have is like looking at the world through a soda straw. We're the first to perform complex analytics on cyber data sets at this scale."

Netezza engineers built the data warehouse appliance specifically to analyze petabytes of detailed data significantly faster than existing data warehouse options, at a much lower total cost of ownership. It stores, filters and processes terabytes of records within a single unit, analyzing only the relevant information for each query. Netezza has placed the CPU power next to the data, allowing its appliances to speed through processes that would occupy most data warehouse systems for hours, or even days, thereby enabling dramatic increases in productivity across an organization.

The combination of high-performance computing with innovative database architectural advances that Netezza has developed has the potential for orders of magnitude performance increase over traditional approaches.

Users of the high performance computing system could examine the data using analytic dashboards, innovative applications for identifying malicious activity and multi-dimensional data cubes for analyst drill down on real-world network data.

The collaboration showed that the Netezza TwinFin appliance can provide linear scalability on a complex graph analysis, using a simulated data set representing multiple years of network traffic, to discover connectivity between two systems via two intermediate connections. The largest dataset was over half a petabyte and contained more than three trillion separate network event records.