From Data to Discovery

By Aaron Dubrow, Texas Advanced Computing Center -- TACC-developed EnVision software simplifies remote visualization: We know that supercomputers can crunch incredible amounts of data, but what do all those numbers mean? And how do scientists interpret them? After all, it’s what scientists discover in the data that determines the value of their research. “Typically, when you do computational simulations on high performance computing [HPC] resources, you come up with a bucket of numbers, and you have to turn those numbers into some graphical form to help you understand what’s going on in the data,” explained Greg S. Johnson, manager of the Scientific Visualization group at the Texas Advanced Computing Center (TACC). This process, called scientific visualization, transforms massive amounts of raw data into a visual format, showing scientists the structure of the early universe, modeling jet turbulence or translating measurements of simulated blood flow into actual images of a working heart. Visualization allows researchers in practically every field of science to see invisible functions and conceptualize connections among variables that have never been considered, fueling the discoveries that make tomorrow’s headlines. A dynamic EnVision webpage showing a catalogue of visualization snapshots.
However, most visualization tools have a steep learning curve, and with little training available, it has been difficult for researchers to learn how to explore their data most effectively. The need for a clean, efficient interface for visualization resources led researchers at TACC to begin developing a web-based software tool that simplifies the tasks of data management, analysis and interpretation involved in visualization. EnVision, the TACC-developed remote visualization tool released on Oct. 22, 2007, addresses the problems of earlier remote visualization software by dramatically simplifying the visualization process, semi-automating data importation and providing powerful, easy-to-use tools. EnVision provides an entry point to researchers who might not be familiar with traditional visualization tools, allowing them to quickly and easily create interactive visualizations. The software will compliment existing high-powered visualization packages like ParaView and AVS (Advanced Visual Systems). "Visualization is a crucial method for data analysis for most computational scientists using HPC systems,” said TACC Director, Jay Boisseau. “EnVision will enable TACC to provide high-end visualization resources to TeraGrid researchers across the country, while at the same time making visualization of large-scale data easier than ever before." Because the data sets that researchers using HPC systems create are often too large to visualize using standard software and local computational resources, the visualization stage of the supercomputing process typically occurs on specialized systems, optimized to create high-resolution interactive imagery and to allow multiple algorithmic interpretations. In these cases, researchers need access to systems capable of storing terabyte-sized data sets, as well as the computational and rendering capabilities to visualize this data. Such high-end visualization systems are uncommon (just like their supercomputing counterparts), so scientists in other parts of the country connect to TACC’s visualization resources and tools via the Internet. “This is remote visualization,” said Greg P. Johnson, TACC visualization specialist and EnVision project lead. “All the rendering, all the computation generating isosurfaces or propagating streamlines, all that is happening on the remote resource. I could take a five-year-old laptop, access Envision, and interactively render huge datasets at good frame-rates even on older hardware – that is one of EnVision’s biggest strengths.” Previously, remote access was only available through a terminal interface or a virtual network computing (VNC) desktop, with work executed on applications that were incredibly time intensive to learn. For these reasons, many scientists made visualization a low priority or delegated the task to their graduate students, a situation EnVision seeks to remedy. Raw data from an HPC simulation
“One of the big problems with visualization is that the learning curve for traditional tools is very high – unnecessarily high, in fact,” Greg S. Johnson added. “The goal of EnVision is to make the scientific visualization process so easy that a researcher can pick up the tool without ever having done visualization before. They explain the format of their data, hit a button and get a picture.” With EnVision, a data import process that may have taken several hours and required custom-built conversion tools can be achieved in a matter of minutes with the assistance of an intuitive online interview provided by the software. Once the data has been imported into the tool, the user selects the desired visualization method, and here, too, EnVision assists the user. Rather than listing the complete set of supported techniques using terminology that may not be meaningful to the user, EnVision displays thumbnail icons showing only the relevant visualization methods applied to data similar to that of the researcher. Selecting a method is as simple as clicking on the appropriate icon. With the push of a button, the program renders complex, interactive, three-dimensional visualizations that can be rotated, zoomed in on or redrawn. The program even allows users to save high-resolution snapshots of their visualizations to a catalogue for easy recall. Currently, EnVision offers researchers three of the most widely used visualization features, allowing scientists to represent their data sets with isosurfacing, cutting planes, and streamlines. More features will be added over the coming months. Dr. Christina Holland, a research scientist associate at The University of Texas Institute for Geophysics, uses visualization tools to help her understand climate change and knows how difficult it can be to use the more advanced software packages. “Doing visualization with previous packages was very difficult,” Dr. Holland said. “I struggled and struggled and eventually got my data into the program. But if your data is in a different format, you have to figure out how to import it and that’s tricky.” The President’s Information Technology Advisory Committee also recognized the problem of difficult-to-use or outdated software running on supercomputers as one of the big challenges facing the HPC community. “Our preoccupation with peak performance and computing hardware, vital though they are, masks the deeply troubling reality that the most serious technical problems in computational science lie in software, usability, and trained personnel,” the Committee reported. “The result is greatly diminished productivity for both researchers and computing systems.” Visualization of a forced isotropic turbulence simulation. Data courtesy of Bazilevs, Calo and Hughes, Institute for Computational Engineering and Sciences (ICES).
With EnVision, remote visualization systems now have a current, accessible, web-based tool that will help create closer connections between data analysis and visualization. TACC’s Maverick system — a terascale remote visualization system, consisting of a Sun E25K with 128 processors, 512 gigabytes of shared memory, and access to more than 15 terabytes of storage — is the first visualization resource in the TeraGrid to utilize EnVision. However, “EnVision was written to support multiple resources, and as time goes by, we hope to add more resources from TACC and from other sites in the TeraGrid,” Greg P. Johnson said. This robust, scalable tool is in its first stage of development. Future versions of the software will be augmented with more visualization algorithms, an interactive file transfer system and greater and more diverse supercomputing resources through the TeraGrid. “I’d like to see us to get to the point where visualization is so simple that it enables a researcher to engage in casual inquiry – to ask questions they might not have otherwise asked due to the complexity of the tools,” Greg S. Johnson said. “We’d like EnVision to be a day-to-day tool, much the way a lay person might use a word processor or a mail program.” “EnVision allows you to take this data set that you’ve created and play with it, just explore it and see what you find,” Dr. Holland said. “You can visualize any kind of data set with EnVision, so I think it’ll be a pretty great tool.” EnVision can be accessed at www.tacc.utexas.edu/envision and currently supports the Maverick Terascale Visualization System as a resource. Initially, users must have an account and allocation on Maverick in order to use EnVision. TeraGrid users should visit http://teragrid.org/userinfo/access/allocations.php for information on how to apply for access to Maverick. UT and UT System users can apply for access to Maverick via the TACC User Portal: https://portal.tacc.utexas.edu/allocations.php.