Identifying Opportunities

Rob Farber pulls peak performance from Ranger for facial recognition: Some day soon, you may use a search engine to locate every frame of home movie in which your Grandmother appears. Soldiers may be able to recognize buried bombs from a distance and doctors may detect early tumors automatically.
Story Highlights:
  • Farber and Trease are using Ranger to import, interpret and database millions of images per second to attain far-faster-than-real-time facial recognition.
  • Using a test set of pictures from YouTube, Farber and Trease correctly identified individuals with 99.8% accuracy, matching 1,998 out of 2,000 faces.
  • By optimizing their code for Ranger’s architecture and minimizing the communications between nodes, Farber was able to reach peak speeds with near-linear scaling across 32,768 cores to achieve some of the best performance currently seen on high-performance computing systems.
  • He will attempt a full-system, 62,976 processing core run in the coming months.
This prospect took a great leap forward recently when Rob Farber, senior scientist at the Pacific Northwest National Laboratory (PNNL), demonstrated the ability to create searchable databases based on image recognition with massive amounts of data rather than text tagging. The research proved the potential of supercomputers to not only simulate virtual experiments, but also to organize, enrich, and improve our present, media-saturated world. “We want to take camera pictures, individual frames, or moving videos from webcams or YouTube, that don’t have any special annotations, and ask the question: ‘Have we seen this person’s face before?’” Farber explained. “That’s a huge data-volume, low information-content kind of video stream and it’s completely unstructured.” Working with Harold Trease, a computational physicist at PNNL, Rob Farber is using Ranger, the massive supercomputing system at the Texas Advanced Computing Center (TACC), to import, interpret and database millions of images per second to achieve much faster-than-real-time facial recognition. Using a test set of pictures under varying conditions and with known identities, Farber and Trease correctly identified individuals with 99.8% accuracy, matching 1,998 out of 2,000 faces. Perhaps just as importantly, Farber was able to maximize on-node performance on Ranger with near-linear scaling to achieve some of the best performance currently seen on high-performance computers. Finding Faces To achieve this feat, a series of information measures and video processing techniques were utilized to extract and quantify complex visual structures, like faces, from the 0s and 1s of a raw video stream — a particularly difficult problem when the light levels, size and angle of the faces are constantly changing, as in real-world unstructured video streams. Using an eight-part process (see sidebar), extraneous information is stripped away, transforming faces into complex, but telltale signatures consisting of different types of information such as hue, shape, and the golden ratio. “We apply these methods to isolate faces and generate various entropy measures to form signatures that allow us to identify and differentiate them,” Farber said. The entropy measures attribute a twenty-dimensional signature to every face; however, comparing and contrasting these complex signatures in such a high-dimensional space is impractical. So Farber first had to convert the identifying data into a more tractable three-dimensional form. This montage of eight images shows the steps that are used to extract faces from video image data. The images from the upper left to upper right show the original frame, the RGB-to-HIS converted frame, the Sobel edge detection filtered frame, and the frame with only skin colored pixels identified. The bottom row contains frames of just skin pixel patches that identify the three faces from this frame. These faces are placed into the face database.
“This optimization procedure is based on models of how the brain learns, or backprogagation in our case,” Farber explained, referring to a common method of teaching artificial neural networks how to perform a given task. “If you look at it from an information sense, we’re taking the twenty dimensions of information and forcing them to be represented accurately in only three dimensions. This can be done through a process of unsupervised learning so an expert does not have to go through and decide what each example means. We can’t do that with a billion frames, so we have the computer train itself.” Though initially used for facial recognition, Farber noted, “our method can be generalized to identify other objects contained within a digital image. We’re not limited to just faces.” Furthermore, Farber “can utilize pictures that were not taken in visible light and improve our identification accuracy by combining data from multiple sources or by incorporating more or better measures into our signatures.” A Need for Speed Farber’s study analyzed approximately six terabytes of data, representing hundreds of YouTube videos — no small feat. But his ultimate goal is much more ambitious. “If you’re looking at all of YouTube, like a Google of the YouTube video space, you’re talking about many orders of magnitude more frames, which is why the scaling behavior of the algorithms plus the computational power of a machine like Ranger is so important,” Farber said. This image shows a larger view of 22.6 million frames of YouTube data plotted at their respective PCA coordinates. The color represents the sequence number of the source video ranging from Blue (video #1) to Red (video #513). The arrows point to just a few of the many clusters and trajectories contained within this very large, rich data set of video events.
Because Farber intends to utilize data sets thousands of times larger, he is particularly interested in getting the maximum performance out of the massively parallel supercomputers he uses. Utilizing compiler intrinsic operations that allow direct access to the processor’s SSE assembly instructions, he can coax four flops per clock cycle per core — the theoretical peak performance — from each AMD Opteron Barcelona core on the floating point intensive part of his code, while scaling in a near-linear fashion up to 32,768 cores. His approach involves optimizing his code for Ranger's architecture and minimizing the communications among nodes. “It takes people who are cognizant of both the algorithms plus the runtime and communications behavior of their algorithms, to scale successfully on massively parallel systems,” Farber said. “For Ranger to get to four flops per clock, I had to rewrite some of the code to use the compiler’s SSE intrinsic operations - basically using the assembly language instructions. That really lit Ranger on fire.” Farber’s performance successes have not gone unnoticed by TACC’s HPC specialists, who are looking to his study for insights into how to make Ranger more effective. “What I find fascinating is how Farber’s code is able to get the maximum speed and performance out of Ranger and apply it to something interesting and socially relevant,” Lars Koesterke, TACC research associate, said. Farber believes his image recognition algorithms will scale to the petascale and beyond. He will attempt a full-system, 62,976 processing core run in the coming months. “We are able to utilize Ranger very efficiently and demonstrate what a wonderful computational instrument it is for the scientific community," Farber said. "In our case, we see very high performance and near peak rate from the Barcelona cores, plus excellent scaling behavior from the interconnect system.” The implications of Farber and Trease’s research are far-reaching. Cancer scans, security surveillance, and satellite imaging may all be improved through these real-time image-detection methods and algorithms. Plus, Farber’s massively parallel mapping of this problem to Ranger has proved to work extremely well in adapting other computational problems to massively parallel machines like Ranger. What’s more, Farber's research has the potential to open up video to indexing capabilities similar to the way search engines like Google have opened up text for searching. With ever-expanding amounts of data available for analysis – from YouTube videos to deep-space scans to seismic sensor data – methodologies for inputting, processing, connecting and extracting meaning from digital information are increasingly crucial. And Farber is prepared, already asking the interesting questions: “What can we do in the future, when an exabyte of data is not really that much data to manipulate?” *********************************************************************** Aaron Dubrow Texas Advanced Computing Center Science and Technology Writer