GOVERNMENT
UIC Scientists Launch Terra Wide Data Mining Project
CHICAGO, IL -- The wordplay in Robert Grossman's Terra Wide Data Mining Testbed project title is the first hint at the scale and scope of the datasets he manages. Tera is the mathematical prefix meaning one trillion; terra is Latin for the earth. Applied to data transfer terms, Grossman's Terra Wide project, launched this month at the SC conference in Denver, Colorado, is aimed at remotely exploring globally held terabyte datasets in real time. Grossman and his University of Illinois at Chicago (UIC) colleague Jason Leigh accessed, correlated and then visualized data generated from a variety of datasets, including earth science data from the National Center for Atmospheric Research (NCAR), El Nino data from the National Oceanic and Atmospheric Administration (NOAA) and cholera data from the World Health Organization (WHO). The underlying aim of the technology behind the testbed is to provide scientists a means to data mine and correlate datasets from different organizations to make new discoveries. "Researchers may be able to find a correlation between global weather patterns and the spread of diseases by correlating data from NCAR and the WHO," said Grossman.
The demonstration also showcased PC-based clusters called TeraNodes, now gradually being deployed throughout the world, which will be dedicated to massive computation, data mining or visualization over national and international high performance networks. In coming years, as optical technology transforms networking capabilities, TeraNodes will become the building blocks for an optically connected web of data. The SC testbed correlated and visualized WHO and NCAR data replicated onto the testbed. There are TeraNodes in Chicago (at UIC), Amsterdam (at SARA, Holland's supercomputer center), Halifax (Dalhousie University), Denver (the SC show floor), London (Imperial College of Science, Technology and Medicine), Virginia (Virginia Tech and ACCESS DC), Michigan (Internet2), California (UC Davis) and Pennsylvania (University of Pennsylvania). Given the large and growing scientific and engineering data resources available on the web, there is a growing need for an easy-to-use data web infrastructure. DataSpace, an open-standards-based system for working with data over the web, is Grossman's attempt to provide such an infrastructure. "DataSpace provides a new way for scientists and engineers to work with each others' data," said Grossman. "If organizations publish their data in the Dataspace format, many others could potentially make use of it." The Terra Wide Data Mining Testbed is an infrastructure built on top of DataSpace for remote analysis, distributed data mining, and real-time exploration of scientific, engineering, defense, business, and other complex data. Tera mining applications are designed to exploit the capabilities provided by emerging domestic and international optical networks so that gigabyte and terabyte datasets can be remotely explored in real time. Leigh, a scientific visualization expert from UIC's Electronic Visualization Laboratory, and Grossman, head of UIC's National Center for Data Mining, are collaborating to develop such tera mining applications. Their partnership is a natural extension of their research interests. Both work with data-intensive, very-high-bandwidth applications that test even the most advanced networks. Both need to cull specific data from massive datasets stored in widely distributed facilities. Both are seeking a means for researchers to accelerate scientific discovery. The optical Terra Wide Testbed is now being built in parallel with another UIC-managed project, StarLight(SM). StarLight is an advanced optical infrastructure and proving ground for network services optimized for high-performance applications, with major funding provided by the National Science Foundation. It is being developed by UIC's Electronic Visualization Laboratory, the International Center for Advanced Internet Research (iCAIR) at Northwestern University, and the Mathematics and Computer Science Division at Argonne National Laboratory, in partnership with Canada's CANARIE and Holland's SURFnet. www.evl.uic.edu/cavern/teranode
www.dataspaceweb.org
www.startap.net/starlight About EVL http://www.evl.uic.edu The Electronic Visualization Laboratory at the University of Illinois at Chicago is the nation's oldest interdisciplinary art and computer science graduate laboratory offering degrees in electronic visualization. Since inventing the CAVE(R) Virtual Reality Theater in 1991, EVL's focus has been the development and deployment of software, hardware, networking and communications tools in support of collaborative tele-immersive virtual-reality applications. EVL receives significant funding from the National Science Foundation to manage projects in support of long-term interconnection and interoperability of advanced international networking. About NCDM http://www.ncdm.uic.edu/ The National Center for Data Mining at the University of Illinois at Chicago was established in 1998 to serve as a national resource for high performance and distributed data mining. NCDM is a co-founding member of the Data Mining Group (DMG), which develops the Predictive Model Markup Language (PMML) and related standards, runs two data mining testbeds (the Terabyte Challenge and the Terra Wide Data Mining Testbed), and has an active outreach program. NCDM is supported by the National Science Foundation, U.S. Department of Energy, University of Illinois at Chicago, and its industrial partners.