Razvan Bunescu, an assistant professor of computer science at Ohio University’s Fritz J. and Dolores H. Russ College of Engineering and Technology, has been awarded a $224,000 grant from the National Science Foundation (NSF) to create tools that will automatically extract world knowledge from Wikipedia.
Collaborating with graduate students from Ohio University, and faculty and graduate students from the University of North Texas, Bunescu plans to build computer programs that can process text documents and extract information about the concepts and entities mentioned in text. The result will be a repository of world knowledge that can then be used to improve artificial intelligence applications that process natural language.
The team’s first goal is to tackle one of the largest collections of world knowledge available: Wikipedia. Bunescu wants to create to ontology – in the form of a multilingual “graph” of world knowledge – that maps every element of the online encyclopedia’s information, and organizes the data to illustrate relationships between those elements.
The computer programs the team is developing will be able to create a network of such relationships by automatically sifting through the semi-structured information available in the top 10 language editions of Wikipedia. In the resulting semantic network, nodes will correspond to the hundreds of thousands of entities and concepts described in Wikipedia, while edges between nodes will capture their semantic relationships.
The graph may be used to provide the much-needed world knowledge in knowledge-intensive applications such as open domain question answering (QA). In open domain QA, users submit natural language questions and the QA system returns pinpointed answers, as opposed to a ranked list of documents.
“Google is able to do an intelligent keyword search, but it cannot yet do this kind of question answering,” Bunescu explains.