NCSA researchers receive patent for system that finds holes in knowledge bases

By Vince Dixon -- NCSA research programmer Alan Craig and a colleague have received a patent for their method of determining the completeness of a knowledge base by mapping the corpus and locating weak links and gaps between important concepts.

Craig, who is also the associate director of human-computer interaction for the Institute of Computing in Humanities, Arts and Social Science (I-CHASS), and Kalev Leetaru, a former NCSA staffer who is now coordinator of information technology and research for the Cline Center for Democracy, were building databases using automatic Web crawling and needed a way of knowing when to stop adding to the collection.

"So this is a method to sort of help figure that out and also direct that system to go looking for more specific pieces of information," Craig said.

Using any collection of information, Craig and Leetaru's method graphs the data, analyzes conceptual distances within the graph and identifies parts of the corpus that are missing important documents. The system then suggests additional concepts that best fill the gaps, creating an otherwise non-existent link between two related concepts. Leetaru said it helps users to complete knowledge bases with information they are initially unaware of.

"You only know what you know, you don't know what you don't know," Leetaru said. "The idea is essentially a system that can help guide you through the process of trying to figure out where the holes are in your knowledge."

Leetaru said applications for the method are endless. The corpus does not have to be computer-based and is useful in any situation that involves a collection of data users are not sure is complete.

Leetaru started working for Craig as an undergraduate student at Illinois. Over the years, the two have collaborated on several projects. "Identifying Conceptual Gaps in a Knowledge Base" is the first patent resulting from a portfolio of disclosures Craig and Leetaru have sent to the university's Office of Technology Management.

The researchers hope that many people will find the newly patented system useful for a wide array of fields.

"I think it's great because, for better or for worse, having a patent implies that it is a contribution of knowledge," Craig said. "I think that's a very valuable thing for us to do here at NCSA is to contribute to the world of knowledge."