Ebook: Neuro-Symbolic Artificial Intelligence: The State of the Art
Neuro-symbolic AI is an emerging subfield of Artificial Intelligence that brings together two hitherto distinct approaches. ”Neuro” refers to the artificial neural networks prominent in machine learning, ”symbolic” refers to algorithmic processing on the level of meaningful symbols, prominent in knowledge representation. In the past, these two fields of AI have been largely separate, with very little crossover, but the so-called “third wave” of AI is now bringing them together.
This book, Neuro-Symbolic Artificial Intelligence: The State of the Art, provides an overview of this development in AI. The two approaches differ significantly in terms of their strengths and weaknesses and, from a cognitive-science perspective, there is a question as to how a neural system can perform symbol manipulation, and how the representational differences between these two approaches can be bridged. The book presents 17 overview papers, all by authors who have made significant contributions in the past few years and starting with a historic overview first seen in 2016. With just seven months elapsed from invitation to authors to final copy, the book is as up-to-date as a published overview of this subject can be.
Based on the editors’ own desire to understand the current state of the art, this book reflects the breadth and depth of the latest developments in neuro-symbolic AI, and will be of interest to students, researchers, and all those working in the field of Artificial Intelligence.
Frank van Harmelen, Vrije Universiteit Amsterdam
The history of an entire scientific field as dynamic and permeable as AI can only be captured in metaphors. But even though they are metaphors, such sketches of history do sometimes manage to capture the essence of how a debate in an entire scientific field evolved. One such useful metaphor about the history of AI is that of the “two towers”, the “two tribes”, or even the “two religions”. These two tribes are known under a variety of names: symbolic vs. subsymbolic, reasoning vs. learning, logic vs. data, model-based vs. function-based, but I prefer to use the terms “knowledge driven” vs. “data driven”. For decades, AI was indeed divided into these two tribes, whose members adhered to different religions: either AI systems were based on manipulating symbols (as succinctly expressed through the “physical symbol systems hypothesis” by Newel and Simon in their 1976 Turing Award paper), or AI systems were based on statistical patterns to be learned from large amounts of data (an approach that earned Bengio, Hinton and Lecun the 2018 Turing Award). And as with any two proper religions, for many years their proponents did not read each other’s books, did not go to each other’s meetings, and did not take each other’s ideas very seriously.
Another, somewhat different metaphor about the history of AI is the picture of two waves. In this myth, the “first wave” of AI consisted of investigations into knowledge-driven AI, with successes in knowledge-based systems (1980s and 1990s), the world-championship chess (1997) and conversational assistants such as Siri (from 2010 onwards). The “second wave” of AI consisted of the rise of Deep Learning systems with impressive successes in machine translations, image recognition and game playing, among many others. If Google Trends is to be believed, the current wide-spread interest in Deep Learning started to take off in early 2015, although its roots of course stretch back deep into AI history, with the current frenzied levels of interest only reached from early 2018 onwards. The more general term “machine learning” shows a similar trend curve.
But things are moving fast in the field of AI. A generous flow of public funding, a keen interest from industry, and a culture of sharing data and code are all contributing factors to the fast pace of progress. And this fast pace of progress is rapidly shattering both of the above metaphors, both the two-tribes myth and the two-waves myth. Increasingly, the two tribes are now converging on the view that neither purely data-driven nor purely knowledge-driven systems alone hold the key to further progress in AI. And increasingly, the first knowledge-driven wave and the second data-driven wave are being rapidly joined by a third wave of systems that are, in their wide variety, known under the common term of “neuro-symbolic systems”.
However, although the proponents of such neuro-symbolic systems are united in what they agree that AI systems should not be (namely: they should be neither purely knowledge-driven nor purely data-driven), these researchers are not at all united in their view on what AI should be. The literature on neuro-symbolic systems (sometimes also known as “hybrid” systems) is rapidly growing, with dozens of new (and often unreviewed) manuscripts appearing on ArXiv every week, but these contributions cover a wide variety of different architectural and algorithmic proposals, with preciously little agreement on which architectures suit which purpose, which algorithms should be used when, what the strengths and weaknesses of the different approaches are, how the results should be evaluated, or even what the theoretical limits are of the different proposals.
Neuro-symbolic systems are the offspring of two very different parents, roughly speaking the fields of “knowledge representation” and “machine learning”, and each of these parents has its own strengths and weaknesses. The first of these parents is characterised by strong theoretical foundations, with analyses on expressivity, on computational tractability, on the trade-offs between the two, and with a compositional theory on how to combine multiple representations. The second parent is characterised by a strong emphasis on empirical results, measured on widely accepted benchmarks and challenge tasks. Conversely, the Knowledge Representation field has never managed to agree on widely shared benchmarks (although exceptions exists in subdisciplines), while much of the classical machine learning theory, founded on the statistical learning theory from the ‘90s, no longer suffices to explain the “unreasonable effectiveness” of deep learning (with [1] a recent attempt at addressing the situation).
The promising “3rd wave of AI”, the neuro-symbolic systems to which this book is devoted, should aim to combine the strengths of both parents. Coming, as I do, from a Knowledge Representation background, I am particularly concerned that we should strive for a good theoretical understanding of neuro-symbolic systems, beyond purely empirically measuring their capabilities. We should start to make some foundational distinctions: what are the possible interactions between knowledge and learning? Can reasoning be used as a symbolic prior for learning? Can such symbolic priors be used to make data-driven systems more sample efficient? More explainable? More generalisable outside the training set? Should such symbolic priors be domain-specific, or should they be in the form of very general knowledge capturing only the abstract structure of time, space, causality and other such general principles? Can symbolic constraints be enforced on data-driven systems to make them more safe? Or less biased? Or can, vice versa, learning be used to yield symbolic knowledge? And if so, how to manage the inherent uncertainty that comes with such learned knowledge? And how to avoid the inevitable bias seeping from the data into the resulting knowledge base? And can such learned knowledge ever capture abstract principles of time, space and causality, or will it only be limited to the domain-specific patterns that arise from a given dataset? Or perhaps, and hopefully, the answer is: both directions (knowledge-driven methods supporting data-driven methods and vice versa) are possible and even required.
However, neuro-symbolic systems currently lack a theory that even begins to ask these questions, let alone answer them. All too often, new conference papers and ArXiv manuscripts simply propose a new neuro-symbolic architecture, or a new algorithm, without even discussing which of the above questions (or any others, for that matter) they aim to address. Systematic surveys such as [2] are a good start for such a theory. Our own work on an informal “boxology” of neuro-symbolic systems ([3] and [4]) is an attempt at such a “theory of neuro-symbolic systems”, but it is highly incomplete, and entirely informal. It remains a challenge to put such a theory on a firm mathematical basis (for example, through pre- and post-conditions on the components of neuro-symbolic systems). Can such a theory be closely coupled with fragments of executable code? And most importantly: is the assumption of loosely coupled components perhaps too strong, and will much more tightly coupled neuro-symbolic systems in the end turn out to be the way forward?
But whatever the answers to these questions will be, we will only make truly scientific and long-lasting progress on AI in general, and on hybrid, neuro-symbolic systems of the “3rd wave of AI” in particular, if we go beyond building systems. We must, in the end, develop a theory of such systems that allows us to predict and understand what neuro-symbolic systems can and can’t do.
References
[1] Berner J, Grohs P, Kutyniok G, et al. The modern mathematics of deep learning. CoRR. 2021; abs/2105.04026. Available from: https://arxiv.org/abs/2105.04026.
[2] Marra G, Dumancic S, Manhaeve R, et al. From statistical relational to neural symbolic artificial intelligence: a survey. CoRR. 2021;abs/2108.11451. Available from: https://arxiv.org/abs/2108. 11451.
[3] van Harmelen F, ten Teije A. A boxology of design patterns forhybrid learningand reasoning systems. J Web Eng. 2019;18(1-3):97–124.
[4] van Bekkum M, de Boer M, van Harmelen F, et al. Modular design patterns for hybrid learning and reasoning systems. Appl Intell. 2021;51(9):6528–6546.
The study and understanding of human behaviour is relevant to computer science, artificial intelligence, neural computation, cognitive science, philosophy, psychology, and several other areas. Presupposing cognition as basis of behaviour, among the most prominent tools in the modelling of behaviour are computational-logic systems, connectionist models of cognition, and models of uncertainty. Recent studies in cognitive science, artificial intelligence, and psychology have produced a number of cognitive models of reasoning, learning, and language that are underpinned by computation. In addition, efforts in computer science research have led to the development of cognitive computational systems integrating machine learning and automated reasoning. Such systems have shown promise in a range of applications, including computational biology, fault diagnosis, training and assessment in simulators, and software verification. This joint survey reviews the personal ideas and views of several researchers on neural-symbolic learning and reasoning. The article is organised in three parts: Firstly, we frame the scope and goals of neural-symbolic computation and have a look at the theoretical foundations. We then proceed to describe the realisations of neural-symbolic computation, systems, and applications. Finally we present the challenges facing the area and avenues for further research.
Symbolic systems require hand-coded symbolic representation as input, resulting in a knowledge acquisition bottleneck. Meanwhile, although deep learning has achieved significant success in many fields, the knowledge is encoded in a subsymbolic representation which is incompatible with symbolic systems. To address the gap between the two fields, one has to solve Symbol Grounding problem: The question of how a machine can generate symbols automatically. We discuss our recent work called Latplan, an unsupervised architecture combining deep learning and classical planning. Given only an unlabeled set of image pairs showing a subset of transitions allowed in the environment (training inputs), Latplan learns a complete propositional PDDL action model of the environment. Later, when a pair of images representing the initial and the goal states (planning inputs) is given, Latplan finds a plan to the goal state in a symbolic latent space and returns a visualized plan execution. We discuss several key ideas that made Latplan possible which would hopefully extend to many other symbolic paradigms outside classical planning.
The tension between deduction and induction is perhaps the most fundamental issue in areas such as philosophy, cognition and artificial intelligence. In this chapter, we survey work that provides evidence for the long-standing and deep connections between logic and learning. After a brief historical prelude, our narrative is then structured in terms of three strands of interaction: logic versus learning, machine learning for logic, and logic for machine learning, but with ample overlap.
Despite the significant success in various domains, the data-driven deep neural networks compromise the feature interpretability, lack the global reasoning capability, and can’t incorporate external information crucial for complicated real-world tasks. Since the structured knowledge can provide rich cues to record human observations and commonsense, it is thus desirable to bridge symbolic semantics with learned local feature representations. In this chapter, we review works that incorporate different domain knowledge into the intermediate feature representation.These methods firstly construct a domain-specific graph that represents related human knowledge. Then, they characterize node representations with neural network features and perform graph convolution to enhance these symbolic nodes via the graph neural network(GNN).Lastly, they map the enhanced node feature back into the neural network for further propagation or prediction. Through integrating knowledge graphs into neural networks, one can collaborate feature learning and graph reasoning with the same supervised loss function and achieve a more effective and interpretable way to introduce structure constraints.
Symbolic reasoning systems based on first-order logics are computationally powerful, and feedforward neural networks are computationally efficient, so unless P=NP, neural networks cannot, in general, emulate symbolic logics. Hence bridging the gap between neural and symbolic methods requires achieving a delicate balance: one needs to incorporate just enough of symbolic reasoning to be useful for a task, but not so much as to cause computational intractability. In this chapter we first present results that make this claim precise, and then use these formal results to inform the choice of a neuro-symbolic knowledge-based reasoning system, based on a set-based dataflow query language. We then present experimental results with a number of variants of this neuro-symbolic reasoner, and also show that this neuro-symbolic reasoner can be closely integrated into modern neural language models.
Tractable Boolean and arithmetic circuits have been studied extensively in AI for over two decades now. These circuits were initially proposed as “compiled objects,” meant to facilitate logical and probabilistic reasoning, as they permit various types of inference to be performed in linear time and a feed-forward fashion like neural networks. In more recent years, the role of tractable circuits has significantly expanded as they became a computational and semantical backbone for some approaches that aim to integrate knowledge, reasoning and learning. In this chapter, we review the foundations of tractable circuits and some associated milestones, while focusing on their core properties and techniques that make them particularly useful for the broad aims of neuro-symbolic AI.
There is a broad consensus that both learning and reasoning are essential to achieve true artificial intelligence. This has put the quest for neural-symbolic artificial intelligence (NeSy) high on the research agenda. In the past decade, neural networks have caused great advances in the field of machine learning. Conversely, the two most prominent frameworks for reasoning are logic and probability. While in the past they were studied by separate communities, a significant number of researchers has been working towards their integration, cf. the area of statistical relational artificial intelligence (StarAI). Generally, NeSy systems integrate logic with neural networks. However, probability theory has already been integrated with both logic (cf. StarAI) and neural networks. It therefore makes sense to consider the integration of logic, neural networks and probabilities. In this chapter, we first consider these three base paradigms separately. Then, we look at the well established integrations, NeSy and StarAI. Next, we consider the integration of all three paradigms as Neural Probabilistic Logic Programming, and exemplify it with the DeepProbLog framework. Finally, we discuss the limitations of the state of the art, and consider future directions based on the parallels between StarAI and NeSy.
Neural-symbolic models bridge the gap between sub-symbolic and symbolic approaches, both of which have significant limitations. Sub-symbolic approaches, like neural networks, require a large amount of labeled data to be successful, whereas symbolic approaches, like logic reasoners, require a small amount of prior domain knowledge but do not easily scale to large collections of data. This chapter presents a general approach to integrate learning and reasoning that is based on the translation of the available prior knowledge into an undirected graphical model. Potentials on the graphical model are designed to accommodate dependencies among random variables by means of a set of trainable functions, like those computed by neural networks. The resulting neural-symbolic framework can effectively leverage the training data, when available, while exploiting high-level logic reasoning in a certain domain of discourse. Although exact inference is intractable within this model, different tractable models can be derived by making different assumptions. In particular, three models are presented in this chapter: Semantic-Based Regularization, Deep Logic Models and Relational Neural Machines. Semantic-Based Regularization is a scalable neural-symbolic model, that does not adapt the parameters of the reasoner, under the assumption that the provided prior knowledge is correct and must be exactly satisfied. Deep Logic Models preserve the scalability of Semantic-Based Regularization, while providing a flexible exploitation of logic knowledge by co-training the parameters of the reasoner during the learning procedure. Finally, Relational Neural Machines provide the fundamental advantages of perfectly replicating the effectiveness of training from supervised data of standard deep architectures, and of preserving the same generality and expressive power of Markov Logic Networks, when considering pure reasoning on symbolic data. The bonding between learning and reasoning is very general as any (deep) learner can be adopted, and any output structure expressed via First-Order Logic can be integrated. However, exact inference within a Relational Neural Machine is still intractable, and different factorizations are discussed to increase the scalability of the approach.
The brain uses recurrent spiking neural networks for higher cognitive functions such as symbolic computations, in particular, mathematical computations. We review the current state of research on spike-based symbolic computations of this type. In addition, we present new results which show that surprisingly small spiking neural networks can perform symbolic computations on bit sequences and numbers and even learn such computations using a biologically plausible learning rule. The resulting networks operate in a rather low firing rate regime, where they could not simply emulate artificial neural networks by encoding continuous values through firing rates. Thus, we propose here a new paradigm for symbolic computation in neural networks that provides concrete hypotheses about the organization of symbolic computations in the brain. The employed spike-based network models are the basis for drastically more energy-efficient computer hardware – neuromorphic hardware. Hence, our results can be seen as creating a bridge from symbolic artificial intelligence to energy-efficient implementation in spike-based neuromorphic hardware.
Human-robot interactive decision-making is increasingly becoming ubiquitous, and explainability is an influential factor in determining the reliance on autonomy. However, it is not reasonable to trust systems beyond our comprehension, and typical machine learning and data-driven decision-making are black-box paradigms that impede explainability. Therefore, it is critical to establish computational efficient decision-making mechanisms enhanced by explainability-aware strategies. To this end, we propose the Trustworthy Decision-Making (TDM), which is an explainable neuro-symbolic approach by integrating symbolic planning into hierarchical reinforcement learning. The framework of TDM enables the subtask-level explainability from the causal relational and understandable subtasks. Besides, TDM also demonstrates the advantage of the integration between symbolic planning and reinforcement learning, reaping the benefits of both worlds. Experimental results validate the effectiveness of proposed method while improving the explainability in the process of decision-making.
Humans have astounding reasoning capabilities. They can learn from very few examples while providing explanations for their decision-making process. In contrast, deep learning techniques–even though robust to noise and very effective in generalizing across several fields including machine vision, natural language understanding, speech recognition, etc. –require large amounts of data and are mostly unable to provide explanations for their decisions. Attaining human-level robust reasoning requires combining sound symbolic reasoning with robust connectionist learning. However, connectionist learning uses low-level representations–such as embeddings–rather than symbolic representations. This challenge constitutes what is referred to as the Neuro-Symbolic gap. A field of study to bridge this gap between the two paradigms has been called neuro-symbolic integration or neuro-symbolic computing. This chapter aims to present approaches that contribute towards bridging the Neuro-Symbolic gap specifically in the Semantic Web field, RDF Schema (RDFS) and EL+ reasoning and to discuss the benefits and shortcomings of neuro-symbolic reasoning.
Attempts to render deep learning models interpretable, data-efficient, and robust have seen some success through hybridisation with rule-based systems, for example, in Neural Theorem Provers (NTPs). These neuro-symbolic models can induce interpretable rules and learn representations from data via back-propagation, while providing logical explanations for their predictions. However, they are restricted by their computational complexity, as they need to consider all possible proof paths for explaining a goal, thus rendering them unfit for large-scale applications. We present Conditional Theorem Provers (CTPs), an extension to NTPs that learns an optimal rule selection strategy via gradient-based optimisation. We show that CTPs are scalable and yield state-of-the-art results on the CLUTRR dataset, which tests systematic generalisation of neural models by learning to reason over smaller graphs and evaluating on larger ones. Finally, CTPs show better link prediction results on standard benchmarks in comparison with other neural-symbolic models, while being explainable. All source code and datasets are available online. (At https://github.com/uclnlp/ctp)
This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks. Different methods for integrating neural language models and knowledge graphs are discussed. The situations in which this combination is most appropriate are characterized, including quantitative evaluation and qualitative error analysis on a variety of commonsense question answering benchmark datasets.
Deep learning has proven effective for various application tasks, but its applicability is limited by the reliance on annotated examples. Self-supervised learning has emerged as a promising direction to alleviate the supervision bottleneck, but existing work focuses on leveraging co-occurrences in unlabeled data for task-agnostic representation learning, as exemplified by masked language model pretraining. In this chapter, we explore task-specific self-supervision, which leverages domain knowledge to automatically annotate noisy training examples for end applications, either by introducing labeling functions for annotating individual instances, or by imposing constraints over interdependent label decisions. We first present deep probabilistic logic (DPL), which offers a unifying framework for task-specific self-supervision by composing probabilistic logic with deep learning. DPL represents unknown labels as latent variables and incorporates diverse self-supervision using probabilistic logic to train a deep neural network end-to-end using variational EM. Next, we present self-supervised self-supervision (S4), which adds to DPL the capability to learn new self-supervision automatically. Starting from an initial seed self-supervision, S4 iteratively uses the deep neural network to propose new self supervision. These are either added directly (a form of structured self-training) or verified by a human expert (as in feature-based active learning). Experiments on real-world applications such as biomedical machine reading and various text classification tasks show that task-specific self-supervision can effectively leverage domain expertise and often match the accuracy of supervised methods with a tiny fraction of human effort.
Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which parts of that particular image is important to the model, but humans may prefer an explanation which explains the prediction by concept-based thinking. In this work, we review the emerging area of concept based explanations. We start by introducing concept explanations including the class of Concept Activation Vectors (CAV) which characterize concepts using vectors in appropriate spaces of neural activations, and discuss different properties of useful concepts, and approaches to measure the usefulness of concept vectors. We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
This chapter will introduce ABL (ABductive Learning), a new paradigm which integrates machine learning and logical reasoning in a balanced loop enabling them to work together in a mutually beneficial way.
The recent availability of large-scale data combining multiple data modalities has opened various research and commercial opportunities in Artificial Intelligence (AI). Machine Learning (ML) has achieved important results in this area mostly by adopting a sub-symbolic distributed representation. It is generally accepted now that such purely sub-symbolic approaches can be data inefficient and struggle at extrapolation and reasoning. By contrast, symbolic AI is based on rich, high-level representations ideally based on human-readable symbols. Despite being more explainable and having success at reasoning, symbolic AI usually struggles when faced with incomplete knowledge or inaccurate, large data sets and combinatorial knowledge.
Neurosymbolic AI attempts to benefit from the strengths of both approaches combining reasoning with complex representation of knowledge and efficient learning from multiple data modalities. Hence, neurosymbolic AI seeks to ground rich knowledge into efficient sub-symbolic representations and to explain sub-symbolic representations and deep learning by offering high-level symbolic descriptions for such learning systems. Logic Tensor Networks (LTN) are a neurosymbolic AI system for querying, learning and reasoning with rich data and abstract knowledge. LTN introduces Real Logic, a fully differentiable first-order language with concrete semantics such that every symbolic expression has an interpretation that is grounded onto real numbers in the domain. In particular, LTN converts Real Logic formulas into computational graphs that enable gradient-based optimization. This chapter presents the LTN framework and illustrates its use on knowledge completion tasks to ground the relational predicates (symbols) into a concrete interpretation (vectors and tensors). It then investigates the use of LTN on semi-supervised learning, learning of embeddings and reasoning. LTN has been applied recently to many important AI tasks, including semantic image interpretation, ontology learning and reasoning, and reinforcement learning, which use LTN for supervised classification, data clustering, semi-supervised learning, embedding learning, reasoning and query answering. The chapter presents some of the main recent applications of LTN before analyzing results in the context of related work and discussing the next steps for neurosymbolic AI and LTN-based AI models.