Preface
Companies frequently face the challenge of screening the continuously increasing number of (Web) documents and assessing the contained information with respect to its relevance and novelty. For instance, technology scouts need to discover and monitor new technologies, while investors and stock brokers would like to be informed about recent acquisitions. The systems that have been developed so far for detecting novel information (semi-)automatically in text documents are often very inefficient. This is due to the fact that most approaches only consider the relevance, but not the novelty, of text documents. The few existing approaches for novel information detection do not use any semantically-structured representation of the already known and of the extracted information.
In this thesis, new approaches for detecting and extracting novel, relevant information from unstructured text documents are presented that exploit the explicit modeling of the semantics of the given and extracted information. Using semantics has the benefit of resolving ambiguities in the language and specifying the exact information need regarding relevance and novelty. The explicit modeling is performed by using Semantic Web technologies such as the Resource Description Framework (RDF). In the presented work, we assume that all knowledge that is known to the system is available in the form of an RDF knowledge graph. Hence, novelty and relevance are considered with regard to a knowledge graph.
The contributions of this thesis can be summarized as follows:
1. We assess the suitability of existing large knowledge graphs for the task of detecting novel information in text documents.
2. We present an approach by which emerging entities are predicted and recommended, respectively, for a knowledge graph.
3. We present an approach for extracting novel, relevant, semantically-structured statements from text documents.
The contributions are presented, applied, and evaluated with the help of several scenarios. The developed approaches are suitable for the recommendation of emerging entities and novel statements, respectively, for the purpose of knowledge graph population as well as for use by users who are dependent on novel information (such as journalists and technology scouts).