Frank van Harmelen, Vrije Universiteit Amsterdam
The history of an entire scientific field as dynamic and permeable as AI can only be captured in metaphors. But even though they are metaphors, such sketches of history do sometimes manage to capture the essence of how a debate in an entire scientific field evolved. One such useful metaphor about the history of AI is that of the “two towers”, the “two tribes”, or even the “two religions”. These two tribes are known under a variety of names: symbolic vs. subsymbolic, reasoning vs. learning, logic vs. data, model-based vs. function-based, but I prefer to use the terms “knowledge driven” vs. “data driven”. For decades, AI was indeed divided into these two tribes, whose members adhered to different religions: either AI systems were based on manipulating symbols (as succinctly expressed through the “physical symbol systems hypothesis” by Newel and Simon in their 1976 Turing Award paper), or AI systems were based on statistical patterns to be learned from large amounts of data (an approach that earned Bengio, Hinton and Lecun the 2018 Turing Award). And as with any two proper religions, for many years their proponents did not read each other’s books, did not go to each other’s meetings, and did not take each other’s ideas very seriously.
Another, somewhat different metaphor about the history of AI is the picture of two waves. In this myth, the “first wave” of AI consisted of investigations into knowledge-driven AI, with successes in knowledge-based systems (1980s and 1990s), the world-championship chess (1997) and conversational assistants such as Siri (from 2010 onwards). The “second wave” of AI consisted of the rise of Deep Learning systems with impressive successes in machine translations, image recognition and game playing, among many others. If Google Trends is to be believed, the current wide-spread interest in Deep Learning started to take off in early 2015, although its roots of course stretch back deep into AI history, with the current frenzied levels of interest only reached from early 2018 onwards. The more general term “machine learning” shows a similar trend curve.
But things are moving fast in the field of AI. A generous flow of public funding, a keen interest from industry, and a culture of sharing data and code are all contributing factors to the fast pace of progress. And this fast pace of progress is rapidly shattering both of the above metaphors, both the two-tribes myth and the two-waves myth. Increasingly, the two tribes are now converging on the view that neither purely data-driven nor purely knowledge-driven systems alone hold the key to further progress in AI. And increasingly, the first knowledge-driven wave and the second data-driven wave are being rapidly joined by a third wave of systems that are, in their wide variety, known under the common term of “neuro-symbolic systems”.
However, although the proponents of such neuro-symbolic systems are united in what they agree that AI systems should not be (namely: they should be neither purely knowledge-driven nor purely data-driven), these researchers are not at all united in their view on what AI should be. The literature on neuro-symbolic systems (sometimes also known as “hybrid” systems) is rapidly growing, with dozens of new (and often unreviewed) manuscripts appearing on ArXiv every week, but these contributions cover a wide variety of different architectural and algorithmic proposals, with preciously little agreement on which architectures suit which purpose, which algorithms should be used when, what the strengths and weaknesses of the different approaches are, how the results should be evaluated, or even what the theoretical limits are of the different proposals.
Neuro-symbolic systems are the offspring of two very different parents, roughly speaking the fields of “knowledge representation” and “machine learning”, and each of these parents has its own strengths and weaknesses. The first of these parents is characterised by strong theoretical foundations, with analyses on expressivity, on computational tractability, on the trade-offs between the two, and with a compositional theory on how to combine multiple representations. The second parent is characterised by a strong emphasis on empirical results, measured on widely accepted benchmarks and challenge tasks. Conversely, the Knowledge Representation field has never managed to agree on widely shared benchmarks (although exceptions exists in subdisciplines), while much of the classical machine learning theory, founded on the statistical learning theory from the ‘90s, no longer suffices to explain the “unreasonable effectiveness” of deep learning (with [1] a recent attempt at addressing the situation).
The promising “3rd wave of AI”, the neuro-symbolic systems to which this book is devoted, should aim to combine the strengths of both parents. Coming, as I do, from a Knowledge Representation background, I am particularly concerned that we should strive for a good theoretical understanding of neuro-symbolic systems, beyond purely empirically measuring their capabilities. We should start to make some foundational distinctions: what are the possible interactions between knowledge and learning? Can reasoning be used as a symbolic prior for learning? Can such symbolic priors be used to make data-driven systems more sample efficient? More explainable? More generalisable outside the training set? Should such symbolic priors be domain-specific, or should they be in the form of very general knowledge capturing only the abstract structure of time, space, causality and other such general principles? Can symbolic constraints be enforced on data-driven systems to make them more safe? Or less biased? Or can, vice versa, learning be used to yield symbolic knowledge? And if so, how to manage the inherent uncertainty that comes with such learned knowledge? And how to avoid the inevitable bias seeping from the data into the resulting knowledge base? And can such learned knowledge ever capture abstract principles of time, space and causality, or will it only be limited to the domain-specific patterns that arise from a given dataset? Or perhaps, and hopefully, the answer is: both directions (knowledge-driven methods supporting data-driven methods and vice versa) are possible and even required.
However, neuro-symbolic systems currently lack a theory that even begins to ask these questions, let alone answer them. All too often, new conference papers and ArXiv manuscripts simply propose a new neuro-symbolic architecture, or a new algorithm, without even discussing which of the above questions (or any others, for that matter) they aim to address. Systematic surveys such as [2] are a good start for such a theory. Our own work on an informal “boxology” of neuro-symbolic systems ([3] and [4]) is an attempt at such a “theory of neuro-symbolic systems”, but it is highly incomplete, and entirely informal. It remains a challenge to put such a theory on a firm mathematical basis (for example, through pre- and post-conditions on the components of neuro-symbolic systems). Can such a theory be closely coupled with fragments of executable code? And most importantly: is the assumption of loosely coupled components perhaps too strong, and will much more tightly coupled neuro-symbolic systems in the end turn out to be the way forward?
But whatever the answers to these questions will be, we will only make truly scientific and long-lasting progress on AI in general, and on hybrid, neuro-symbolic systems of the “3rd wave of AI” in particular, if we go beyond building systems. We must, in the end, develop a theory of such systems that allows us to predict and understand what neuro-symbolic systems can and can’t do.
References
[1] Berner J, Grohs P, Kutyniok G, et al. The modern mathematics of deep learning. CoRR. 2021; abs/2105.04026. Available from: https://arxiv.org/abs/2105.04026.
[2] Marra G, Dumancic S, Manhaeve R, et al. From statistical relational to neural symbolic artificial intelligence: a survey. CoRR. 2021;abs/2108.11451. Available from: https://arxiv.org/abs/2108. 11451.
[3] van Harmelen F, ten Teije A. A boxology of design patterns forhybrid learningand reasoning systems. J Web Eng. 2019;18(1-3):97–124.
[4] van Bekkum M, de Boer M, van Harmelen F, et al. Modular design patterns for hybrid learning and reasoning systems. Appl Intell. 2021;51(9):6528–6546.