data:image/s3,"s3://crabby-images/6b1e7/6b1e71395414bb70ad5c7a9e0ee0124b764d5aee" alt="loader"
data:image/s3,"s3://crabby-images/6db43/6db43f5a1ed8d90bf241bde06f5654624795e8c5" alt="cover"
Knowledge Graphs (KGs) about real-world entities and their properties are an important asset in many AI applications. Web-scale KGs store almost only positive statements, and miss out on negative statements. Due to the incompleteness of open-world KGs, absent statements are considered unknown, rather than false. This dissertation makes the case for enriching KGs with informative statements that do not hold, and thus enhancing their usability in applications such as question answering and entity summarization. With potentially billions of candidate negative statements, we tackle four main challenges.
1. Correctness (or plausibility) of negative statements: operating under the Open-World Assumption (OWA), it is not sufficient to check if a candidate negative is not explicitly stated as positive in the KG, since it might be a missing positive. Methods to scrutinize large sets of candidates and prune false positives are crucial.
2. Salience of negative statements: the set of correct negative statements is very large but full of trivial or nonsensical statements, e.g., “A cat cannot store data.”. Methods to quantify the informativeness of negatives are necessary.
3. Coverage of subjects: depending on the source of data and methods for retrieving candidates, some subjects or entities in the KG might receive zero candidate negatives. Methods must ensure the ability to discover negatives about almost any existing entity.
4. Complex negative statements: in some cases, expressing a negation requires more than one KG triple. For instance, “Einstein did not receive an education” is a false negative, but “Einstein did not receive an education at a U.S. university” is a true negative. Methods to generate conditioned negatives are needed.
This dissertation tackles these challenges as follows.
1. We first make the case for selective materialization of negative statements about entities in encyclopedic (well-canonicalized) open-world KGs, and formally define three types of negative statements: grounded, universally absent, and conditional negative statements. We present the peer-based negation inference method to compile lists of salient negatives about entities. The method computes relevant peers for a given input entity, and uses their positives to set expectations for the input entity. An expectation that does not hold is an immediate candidate negative, and is then scored using frequency, importance, and unexpectedness metrics.
2. We propose the pattern-based query log extraction method to extract salient negatives from rich textual sources. This method extracts salient negatives about an entity by harvesting large corpora, i.e., search engine’s query logs, using a few handcrafted patterns with negative keywords.
3. We introduce the UnCommonsense method to generate salient negative phrases about everyday concepts in less-canonicalized commonsense KGs. This method is designed to handle negation inference, scrutiny, and ranking of short natural language phrases. It computes comparable concepts for a given target concept, infers candidate negatives from comparing their positives, and scrutinizes these candidates against the KG itself, as well as Language Models (LMs) as an external source of knowledge. Finally, candidates are ranked using semantic-similarity-aware frequency measures.
4. To facilitate exploring our methods and their results, we implement two prototype systems. In Wikinegata, a system to showcase the peer-based method is developed where users can explore negative statements about 500K entities of 11 classes, and adjust different parameters of the peer-based inference method. They can also query the KG using triple patterns with negated predicates. In the UnCommonsense system, users can closely inspect what the method produces at every step, as well as browse negatives about 8K everyday concepts. Moreover, using the peer-based negation inference method, we create the first large-scale dataset on demographics and outliers in communities of interest, and show its usefulness in use cases such as identifying under-represented groups.
5. We release all datasets and code produced in these projects at https://www.mpi-inf.mpg.de/negation-in-kbs and https://www.mpi-inf.mpg.de/Uncommonsense.