CEA & Bull announce record performance for image search in large-scale databases

Bull and the CEA - the French Atomic Energy Authority - announce that they have achieved a record performance for image searches in very large-scale databases. Finding one image among 22 million stored in a database now takes just six seconds, i.e. 3.7 million images per second, 5 times faster than previously. This record result was achieved on a supercomputer designed and supplied by Bull, using the multimedia search software specially developed by CEA LIST as part of the FAME2 project. It opens the way to a vast field of applications ranging from business intelligence to comparison of medical images, from 'data mining' on the Internet to e-business and content management. Today, Internet search engines carry out image searches using just textual description as criteria (image name, or caption, for example). By carrying out the search based on an analysis of the image content, the Piria search engine developed by the CEA provides a much more powerful solution, opening the way to a vast field of applications: from business intelligence to comparison of medical imagery, 'data mining' on the Internet, e-business and content management. The CEA LIST leads research into multi-lingual multimedia knowledge engineering, and for several years now has been developing knowledge extraction techniques with the aim of improving the relevance of the results obtained. The principle underlying content-based searches for images involves calculating a visual or coded signature for each image in a database, and classifying these signatures in an index. The query is articulated as an image, and produces a response in the form of similar images. These search techniques based on content, that start with an analysis of pixel values, are intrinsically very power-hungry in terms of computing resources. In the FAME2 project, which the CEA is part of, researchers have had access to significant High-Performance Computing (HPC) resources for the purpose of testing the Piria image search application on an extremely large-scale database. As part of the testing process, the Piria engine code had to be adapted for the parallel architecture of the supercomputer developed by Bull (consisting of 88 Intel Itanium processor cores and 50 Terabytes of disc space), enabling integration of the database of 22 million images, occupying some 2.9 Terabytes. This initiative was led by the CEA/DAM, and involved close collaboration between the CEA LIST teams and Bull. The results of this development were presented during the summer of 2007: the 22 million images were indexed in less than one week of processing time, using 48 of the supercomputer's Intel Itanium processor cores; once the database was indexed, users could submit a query from their browser application and obtain almost instantaneous responses. A world record performance The Piria engine enables an image search among 22 million images to be complete in just six seconds - i.e. 3.7 million images per second, 5 times faster than previously - compared with the search for an image among 11 million using the Cortina system, a content-based search engine accessible via the Internet and developed by the University of California at Santa Barbara (UCSB). This benchmark was one of the major challenges that the FAME2 project set itself. This success demonstrates the power of these image recognition technologies developed by the CEA LIST on very large databases occupying several terabytes. These technologies are marketed by the company NewPhenix.