Ebook: ECAI 2023
Artificial intelligence, or AI, now affects the day-to-day life of almost everyone on the planet, and continues to be a perennial hot topic in the news.
This book presents the proceedings of ECAI 2023, the 26th European Conference on Artificial Intelligence, and of PAIS 2023, the 12th Conference on Prestigious Applications of Intelligent Systems, held from 30 September to 4 October 2023 and on 3 October 2023 respectively in Kraków, Poland. Since 1974, ECAI has been the premier venue for presenting AI research in Europe, and this annual conference has become the place for researchers and practitioners of AI to discuss the latest trends and challenges in all subfields of AI, and to demonstrate innovative applications and uses of advanced AI technology. ECAI 2023 received 1896 submissions – a record number – of which 1691 were retained for review, ultimately resulting in an acceptance rate of 23%. The 390 papers included here, cover topics including machine learning, natural language processing, multi agent systems, and vision and knowledge representation and reasoning. PAIS 2023 received 17 submissions, of which 10 were accepted after a rigorous review process. Those 10 papers cover topics ranging from fostering better working environments, behavior modeling and citizen science to large language models and neuro-symbolic applications, and are also included here.
Presenting a comprehensive overview of current research and developments in AI, the book will be of interest to all those working in the field.
This volume contains the proceedings of the Twenty-sixth European Conference on Artificial Intelligence (ECAI 2023), held from September 30th to October 4th, 2023. Since 1974, the European Conference on Artificial Intelligence, organised by the European Association for Artificial Intelligence (EurAI), has been the premier venue for presenting AI research in Europe. ECAI is the place for researchers and practitioners of Artificial Intelligence (AI) to gather and to discuss the latest trends and challenges in all subfields of AI as well as to demonstrate innovative applications and uses of advanced AI technology. ECAI was originally organised as a biennial event, but is now a yearly event, co-organised with the International Joint Conference on Artificial Intelligence (IJCAI) every four years.
ECAI 2023 had a record number of submissions with 1896 paper submissions of which 1691 papers were retained for review. The selection process was competitive, with an acceptance rate of 24%. The largest number of submissions were in the broad areas of Machine Learning (28%), Natural Language Processing and Multi Agent Systems (each with 12%), Vision and Knowledge Representation and Reasoning (each with 8%).
ECAI 2023 coordinated with IJCAI 2023 to allow rejected papers from this conference to submit their original reviews together with a statement about how the paper was revised, and also allowed AAMAS 2023 extended abstracts to be submitted to the conference. We received 108 submissions, out of which 32 were accepted, resulting in a somewhat higher acceptance rate than regular submissions.
This ECAI edition included 5 keynote speakers: Emma Brunskill (Stanford), Ryan Cotterell (ETH), Gal Chechik (Bar Ilan), Ariel Procaccia (Harvard), Shannon Vallor (Edinburgh), and a contribution by Luc Steels on the occasion of his EurAI Distinguished Service Award 2022.
ECAI 2023 included 10 tutorials, selected by the Tutorial Chairs Jacek Mańdziuk and Bart Bogaerts, and 21 workshops, selected by the Workshop Chairs Tom Lenaerts and Paolo Turrini. Following tradition, ECAI 2023 incorporated the 12th Conference on Prestigious Applications of Intelligent Systems (PAIS 2023), chaired by Claudia Goldman and Martin Atzmueller. The 10 papers from PAIS are included in this volume.
ECAI 2023 had a special focus on starting researchers, who had the opportunity of participating in several specific events including the 10th Starting AI Researcher Symposium (STAIRS) chaired by Katrien Beuls and Malte Schilling, a Doctoral Consortium, chaired by Nadin Kokciyan and Piotr Skowron, and the traditional Meeting with an EurAI Fellow.
We would like to thank the authors, the Area Chairs (ACs), Senior Program Committee members (SPCs), and regular Program Committee members (PCs), for their work, which unfortunately spanned over the summer holidays.
We also want to thank all chairs who have done an excellent job in their selection tasks and have been very prompt and active and responsive for the entire period of the conference organisation. We are also indebted to over 1,000 of ECAI 2023 authors who volunteered to join a reviewing pool and be called to assist in reviewing a paper. This allowed us to recruit competent reviewers at short notice and to ensure the quality of the reviews.
We are particularly grateful to the workflow chairs Roxana Rădulescu and Roy Fairstein for providing us with essential help throughout our ECAI journey, from writing scripts to managing the review process, assisting with communication with the authors and chairs, and planning the conference program. We are also obliged to Tito Cruz who helped in the assignment of the papers to review, selecting accepted papers, and planning the technical sections. Without their help we would have been lost.
We are grateful for all the assistance from EurAI in providing us with resources, support and advice. Special thanks to Carles Sierra, EurAI Chair, for always being available to us and helping us with all of our requests.
Kobi Gal
ECAI 2023 Program Chair
Ann Nowé
ECAI 2023 General Chair
Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification. We learn non-Markovian finite task specifications as finite-state ‘task automata’ from episodes of agent experience within environments with unknown dynamics. First, we learn a product MDP, a model composed of the specification’s automaton and the environment’s MDP (both initially unknown), by treating it as a partially observable MDP and employing a hidden Markov model learning algorithm. Second, we efficiently distil the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our automaton enables a task to be decomposed into sub-tasks, so an RL agent can later synthesise an optimal policy more efficiently. It is also an interpretable encoding of high-level task features, so a human can verify that the agent’s learnt tasks have no misspecifications. Finally, we also take steps towards ensuring that the automaton is environment-agnostic, making it well-suited for use in transfer learning.
An opinion diffusion scenario is considered where two marketers compete to diffuse their own opinions over a social network. In particular, they implement social proof marketing approaches that naturally give rise to a strategic setting, where it is crucial to find the appropriate order for targeting the individuals to which provide the incentives to adopt their opinions. The setting is extensively studied from the theoretical and empirical viewpoint, by considering strategies defined in a compact way, such as those that can be defined by selecting the individuals according to their degree of centrality in the underlying network. In addition to depicting a clear picture of the complexity issues arising in the setting, several compact strategies are empirically compared on real-world social networks. Results suggest that the effectiveness of compact strategies is moderately influenced by the characteristic of the network, with some centrality measures naturally emerging as good candidates to define heuristic approaches for marketing campaigns.
Hierarchical Text Classification (HTC) has recently gained traction given the ability to handle complex label hierarchy. This has found applications in domains like E- commerce, customer care and medicine industry among other real-world applications. Existing HTC models either encode label hierarchy separately and mix it with text encoding or guide the label hierarchy structure in the text encoder. Both approaches capture different characteristics of label hierarchy and are complementary to each other. In this paper, we propose a Hierarchical Text Classification using Contrastive Learning Informed Path guided hierarchy (HTC-CLIP), which learns hierarchy-aware text representation and text informed path guided hierarchy representation using contrastive learning. During the training of HTC-CLIP, we learn two different sets of class probabilities distributions and during inference, we use the pooled output of both probabilities for each class to get the best of both representations. Our results show that the two previous approaches can be effectively combined into one architecture to achieve improved performance. Tests on two public benchmark datasets showed an improvement of 0.99 – 2.37% in Macro F1 score using HTC-CLIP over the existing state-of-the-art models.
OWL and SHACL are two prominent W3C standards for managing RDF graphs, the data model of the Web. They are used for different purposes and make different assumptions about the completeness of data: SHACL is used for expressing integrity constraints on complete data, while OWL allows inferring implicit facts from incomplete data; SHACL reasoners perform validation, while OWL reasoners do logical inference. Integrating these two tasks into one uniform approach is a relevant but challenging problem. The SHACL standard envisions graph validation in combination with OWL entailment, but it does not provide technical guidance on how to realize this. To address this problem, we propose a new intuitive semantics for validating SHACL constraints with OWL 2 QL ontologies based on a suitable notion of the chase. We propose an algorithm that rewrites a set of recursive SHACL constraints (with stratified negation) and an OWL 2 QL ontology into a stand-alone set of SHACL constraints that preserves validation for every input graph, which can in turn be evaluated using an off-the-shelf SHACL validator. We show that validation in this setting is EXPTIME-complete in combined complexity, but only PTIME-complete in data complexity, i.e., if the constraints and the ontology are fixed.
The Collaborative Research Center for Everyday Activity Science & Engineering (CRC EASE) aims to enable robots to perform environmental interaction tasks with close to human capacity. It therefore employs a shared ontology to model the activity of both kinds of agents, empowering robots to learn from human experiences. To properly describe these human experiences, the ontology will strongly benefit from incorporating characteristics of neuronal information processing which are not accessible from a behavioral perspective alone. We, therefore, propose the analysis of human neuroimaging data for evaluation and validation of concepts and events defined in the ontology model underlying most of the CRC projects. In an exploratory analysis, we employed an Independent Component Analysis (ICA) on functional Magnetic Resonance Imaging (fMRI) data from participants who were presented with the same complex video stimuli of activities as robotic and human agents in different environments and contexts. We then correlated the activity patterns of brain networks represented by derived components with timings of annotated event categories as defined by the ontology model. The present results demonstrate a subset of common networks with stable correlations and specificity towards particular event classes and groups, associated with environmental and contextual factors. These neuronal characteristics will open up avenues for adapting the ontology model to be more consistent with human information processing.
In the real world, the execution of the actions planned for an agent is never guaranteed to succeed, as they can fail in a number of unexpected ways that are not explicitly captured in the planning model. Based on these observations, we introduce the task of finding plans for classical planning that are resilient to action execution failures. We refer to this problem as Resilient Planning and to its solutions as K-resilient plans; such plans guarantee that an agent will always be able to reach its goals (possibly by replanning alternative sequences of actions) as long as no more than K failures occur along the way. We also present RESPLAN, a new algorithm for Resilient Planning, and we compare its performance to methods based on compiling Resilient Planning to Fully-Observable-Non-Deterministic (FOND) planning.
Rent division consists in simultaneously computing an allocation of rooms to agents and a payment, starting from an individual valuation of each room by each agent. When agents have budget limits, it is known that envy-free solutions do not necessarily exist. We propose two solutions to overcome this problem. In the first one, we relax envy-freeness to account for budget disparities. In the second one, we allow fractional allocations, in which agents may change rooms during the duration of the lease.
Transformer-based language models demonstrate exceptional performance in Natural Language Processing (NLP) tasks but remain susceptible to backdoor attacks involving hidden input triggers. Trojan injection via hardware bitflips presents a significant challenge for contemporary language models. However, previous research overlooks practical hardware considerations, such as DRAM and cache memory structures, resulting in unrealistic attacks that demand the manipulation of an excessive number of parameters and bits. In this paper, we present TrojBits, a novel approach requiring minimal bit-flips to effectively insert Trojans into real-world Transformer language model systems. This is achieved through a three-module framework designed to efficiently target Transformer-based language models, consisting of Vulnerable Parameters Ranking (VPR), Hardware-aware Attack Optimization (HAO), and Vulnerable Bits Pruning (VBP). Within the VPR module, we are the first to employ Gradient-guided Fisher information to identify the most susceptible Transformer parameters, specifically in the word embedding layer. The HAO module then redistributes these parameters across multiple triggers, conforming to hardware constraints by incorporating a regularization term in the trojan optimization methodology. Finally, the VBP module aims to reduce the number of bit-flips by discarding less significant bits. We evaluate TrojBits on two representative NLP models, BERT and XLNE, on three classification tasks (SST2, OffensEval, and AG’s News). Our results demonstrate that our TrojBits successfully achieves the inference-time attack with only 64 parameters out of 116 million and 90-bit flips while maintaining the model performance.
Image geolocalization is receiving increasing attention due to its importance in several applications, such as image retrieval, criminal investigations and fact-checking. Previous works focused on several instances of image geolocalization including place recognition, GPS coordinates estimation and country recognition. In this paper, we tackle an even more challenging problem, which is recognizing the city where an image has been taken. Due to the vast number of cities in the world, we cast the problem as a verification problem, whereby the system has to decide whether a certain image has been taken in a given city or not. In particular, we present a system that given a query image and a small set of images taken in a target city, decides if the query image has been shot in the target city or not. To allow the system to handle the case of images, taken in cities that have not been used during training, we use a Siamese network based on Vision Transformer as a backbone. The experiments we run prove the validity of the proposed system which outperforms solutions based on state-of-the-art techniques, even in the challenging case of images shot in different cities of the same country.
Dung’s Argumentation Framework (AF) has been extended in several directions. An interesting extension, among others, is the Epistemic AF (EAF) which allows representing the agent’s belief by means of epistemic constraints. In particular, an epistemic constraint is a propositional formula over labeled arguments (e.g. in(a), out(c)) extended with the modal operators K and M that intuitively state that the agent believes that a given formula is certainly or possibly true, respectively. In this paper, focusing on EAF, we investigate the complexity of the possible and necessary variants of three canonical problems in abstract argumentation: verification, existence, and non-empty existence. Moreover, we explore the relationship between EAF and incomplete AF (iAF), an extension of AF where arguments and attacks may be uncertain. Our complexity analysis shows that the verification problem in iAF can be naturally reduced to the verification in EAF, while it turns out that a similar result cannot hold for the necessary (non-empty) existence problem.
Federated Learning (FL) is essential for building global models across distributed environments. However, it is significantly vulnerable to data and model poisoning attacks that can critically compromise the accuracy and reliability of the global model. These vulnerabilities become more pronounced in heterogeneous environments, where clients’ data distributions vary broadly, creating a challenging setting for maintaining model integrity. Furthermore, malicious attacks can exploit this heterogeneity, manipulating the learning process to degrade the model or even induce it to learn incorrect patterns. In response to these challenges, we introduce RFCL, a novel Robust Federated aggregation method that leverages CLustering and cosine similarity to select similar cluster models, effectively defending against data and model poisoning attacks even amidst high data heterogeneity. Our experiments assess RFCL’s performance against various attacker numbers and Non-IID degrees. The findings reveal that RFCL outperforms existing robust aggregation methods and demonstrates the capability to defend against multiple attack types.
We present a new supervised learning technique for the Variational AutoEncoder (VAE) that allows it to learn a causally disentangled representation and generate causally disentangled outcomes simultaneously. We call this approach Causally Disentangled Generation (CDG). CDG is a generative model that accurately decodes an output based on a causally disentangled representation. Our research demonstrates that adding supervised regularization to the encoder alone is insufficient for achieving a generative model with CDG, even for a simple task. Therefore, we explore the necessary and sufficient conditions for achieving CDG within a specific model. Additionally, we introduce a universal metric for evaluating the causal disentanglement of a generative model. Empirical results from both image and tabular datasets support our findings.
Unsupervised Representation Learning on graphs is gaining traction due to the increasing abundance of unlabelled network data and the compactness, richness, and usefulness of the representations generated. In this context, the need to consider fairness and bias constraints while generating the representations has been well-motivated and studied to some extent in prior works. One major limitation of most of the prior works in this setting is that they do not aim to address the bias generated due to connectivity patterns in the graphs, such as varied node centrality, which leads to a disproportionate performance across nodes. In our work, we aim to address this issue of mitigating bias due to inherent graph structure in an unsupervised setting. To this end, we propose CAFIN, a centrality-aware fairness-inducing framework that leverages the structural information of graphs to tune the representations generated by existing frameworks. We deploy it on GraphSAGE (a popular framework in this domain) and showcase its efficacy on two downstream tasks – Node Classification and Link Prediction. Empirically, CAFIN consistently reduces the performance disparity across popular datasets (varying from 18 to 80% reduction in performance disparity) from various domains while incurring only a minimal cost of fairness.
We suggest a simple Gaussian mixture model for data generation that complies with Feldman’s long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered for optimal generalization to new data. Finally, we show that the performance gap between linear and nonlinear models can be lessened as the tail becomes shorter in the subpopulation frequency distribution, as confirmed by experiments on synthetic and real data.
We define contrastive explanations that are suited to tree-based classifiers. In our framework, contrastive explanations are based on the set of (possibly non-independent) Boolean characteristics used by the classifier and are at least as general as contrastive explanations based on the set of characteristics of the instances considered at start. We investigate the computational complexity of computing contrastive explanations for Boolean classifiers (including tree-based ones), when the Boolean conditions used are not independent. Finally, we present and evaluate empirically an algorithm for computing minimum-size contrastive explanations for random forests.
In this paper, we study the Maximum Vertex-weighted b-Matching (MVbM) problem on bipartite graphs in a new game-theoretical environment. In contrast to other game-theoretical settings, we consider the case in which the value of the tasks is public and common to every agent so that the private information of every agent consists of edges connecting them to the set of tasks. In this framework, we study three mechanisms. Two of these mechanisms, namely MBFS and MDFS, are optimal but not truthful, while the third one, MAP, is truthful but sub-optimal. Albeit these mechanisms are induced by known algorithms, we show MBFS and MDFS are the best possible mechanisms in terms of Price of Anarchy and Price of Stability, while MAP is the best truthful mechanism in terms of approximated ratio. Furthermore, we characterize the Nash Equilibria of MBFS and MDFS and retrieve sets of conditions under which MBFS acts as a truthful mechanism, which highlights the differences between MBFS and MDFS. Finally, we extend our results to the case in which agents’ capacity is part of their private information.
Due to the large number of submissions that more and more conferences experience, finding an automatized way to well distribute the submitted papers among reviewers has become necessary. We model the peer-reviewing matching problem as a bilevel programming (BP) formulation. Our model consists of a lower-level problem describing the reviewers’ perspective and an upper-level problem describing the editors’. Every reviewer is interested in minimizing their overall effort, while the editors are interested in finding an allocation that maximizes the quality of the reviews and follows the reviewers’ preferences the most. To the best of our knowledge, the proposed model is the first one that formulates the peer-reviewing matching problem by considering two objective functions, one to describe the reviewers’ viewpoint and the other to describe the editors’ viewpoint. We demonstrate that both the upper-level and lower-level problems are feasible and that our BP model admits a solution under mild assumptions. After studying the properties of the solutions, we propose a heuristic to solve our model and compare its performance with the relevant state-of-the-art methods. Extensive numerical results show that our approach can find fairer solutions with competitive quality and less effort from the reviewers.(Our code website: https://github.com/Galaxy-ZRX/Bilevel-Review.)
We consider bidding games, a class of two-player zero-sum graph games. The game proceeds as follows. Both players have bounded budgets. A token is placed on a vertex of a graph, in each turn the players simultaneously submit bids, and the higher bidder moves the token, where we break bidding ties in favor of Player 1. Player 1 wins the game iff the token visits a designated target vertex. We consider, for the first time, poorman discrete-bidding in which the granularity of the bids is restricted and the higher bid is paid to the bank. Previous work either did not impose granularity restrictions or considered Richman bidding (bids are paid to the opponent). While the latter mechanisms are technically more accessible, the former is more appealing from a practical standpoint. Our study focuses on threshold budgets, which is the necessary and sufficient initial budget required for Player 1 to ensure winning against a given Player 2 budget. We first show existence of thresholds. In DAGs, we show that threshold budgets can be approximated with error bounds by thresholds under continuous-bidding and that they exhibit a periodic behavior. We identify closed-form solutions in special cases. We implement and experiment with an algorithm to find threshold budgets.
Neural networks (NNs) have various applications in AI, but explaining their decisions remains challenging. Existing approaches often focus on explaining how changing individual inputs affects NNs’ outputs. However, an explanation that is consistent with the input-output behaviour of an NN is not necessarily faithful to the actual mechanics thereof. In this paper, we exploit relationships between multi-layer perceptrons (MLPs) and quantitative argumentation frameworks (QAFs) to create argumentative explanations for the mechanics of MLPs. Our SpArX method first sparsifies the MLP while maintaining as much of the original structure as possible. It then translates the sparse MLP into an equivalent QAF to shed light on the underlying decision process of the MLP, producing global and/or local explanations. We demonstrate experimentally that SpArX can give more faithful explanations than existing approaches, while simultaneously providing deeper insights into the actual reasoning process of MLPs.
Interpretation methods for learned models used in natural language processing (NLP) applications usually provide support for local (specific) explanations, such as quantifying the contribution of each word to the predicted class. But they typically ignore the potential interaction amongst those word tokens. Unlike currently popular methods, we propose a deep model which uses feature attribution and identification of dependencies to support the learning of interpretable representations that will support creation of hierarchical explanations. In addition, hierarchical explanations provide a basis for visualizing how words and phrases are combined at different levels of abstraction, which enables end-users to better understand the prediction process of a deep network. Our study uses multiple well-known datasets to demonstrate the effectiveness of our approach, and provides both automatic and human evaluation.
In recent years, representations from brain activity patterns and pre-trained language models have been linked to each other based on neural fits to validate hypotheses about language processing. Nonetheless, open questions remain about what intrinsic properties of language processing these neural fits reflect and whether they differ across neural fit approaches, brain networks, and models. In this study, we use parallel sentence and functional magnetic resonance imaging data to perform a comprehensive analysis of four paradigms (masked language modeling, pragmatic coherence, semantic comparison, and contrastive learning) representing linguistic hypotheses about sentence processing. We include three sentence embedding models for each paradigm, resulting in a total of 12 models, and examine differences in their neural fit to four different brain networks using regression-based neural encoding and Representational Similarity Analysis (RSA). Among the different models tested, GPT-2, SkipThoughts, and S-RoBERTa yielded the strongest correlations with language network patterns, whereas contrastive learning-based models resulted in overall low neural fits. Our findings demonstrate that neural fits vary across brain networks and models representing the same linguistic hypothesis (e.g., GPT-2 and GPT-3). More importantly, we show the need for both neural encoding and RSA as complementary methods to provide full understanding of neural fits. All code used in the analysis is publicly available: https://github.com/lcn-kul/sentencefmricomparison.
Our goal in this paper is to significantly decrease the compiled size of a given Boolean instance with a large representation, while preserving as much information about the instance as possible. We achieve this by assigning values to a subset of the variables in such a way that the resulting instance has a much smaller representation than the original one, and its number of solutions is almost as high as the starting one. We call the set of variable instantiations that we make the selective backbone of the solutions that we keep. Large selective backbones allow for smaller representations, but also eliminate more solutions. We compare different methods of computing the selective backbone that offer the best compromise.
Multi-agent active search requires autonomous agents to choose sensing actions that efficiently locate targets. In a realistic setting, agents also must consider the costs that their decisions incur. Previously proposed active search algorithms simplify the problem by ignoring uncertainty in the agent’s environment, using myopic decision making, and/or overlooking costs. In this paper, we introduce an online active search algorithm to detect targets in an unknown environment by making adaptive cost-aware decisions regarding the agent’s actions. Our algorithm proposes an online lookahead planner that combines priniciples from Monte Carlo Tree Search, Thompson sampling and pareto-optimal confidence bounds for decentralized multi-agent multi-objective optimization in an unknown environment. We analyze the algorithm’s performance in simulation to show its efficacy in cost-aware active search.