Ebook: Computational Models of Argument
This book presents the proceedings of COMMA 2024, the 10th biennial International Conference on Computational Models of Argument, held from 18 to 20 September 2024 in Hagen, Germany. The COMMA conference series provides a dedicated forum for the presentation and discussion of the latest advancements in this interdisciplinary field, covering basic research, systems and innovative applications and nurturing the steady growth of interest in computational-argumentation research worldwide.
A total of 63 submissions was received for the conference, and after a thorough review process 26 were accepted as full papers, with a further 12 accepted as demos accompanied by an extended abstract, and 2 as full papers with accompanying demo, resulting in an acceptance rate of 53% for full papers and 63% for papers and demo abstracts combined.
In addition to these 40 papers, extended abstracts of the 3 invited talks are also included here: Semantics in Argumentation: Classifications and Challenges, by Leila Amgoud; Expanding the Scope of Bayesian Argumentation, by Ulrike Hahn; and The Long Road to Trustworthy Natural Language Argumentation, by Serena Villata.
The book provides a fascinating overview of current research and innovations, and will be of interest to all those working in the field.
2024 marks a milestone for COMMA, bringing as it does the tenth installment of our conference, and a coming-of-age eighteen years since the first edition. The conference series has its roots in an EU 6th framework project, Argumentation Service Platform with Integrated Components, ASPIC, which, at its conclusion, founded the conference series with the inaugural edition hosted by the University of Liverpool in 2006. Since then, the biennial International Conference on Computational Models of Argument (COMMA) has provided a dedicated forum for presentation and discussion of the latest advancements in this interdisciplinary field, covering basic research, systems and innovative applications. The conference has nurtured and facilitated the steady growth of interest in computational argumentation research worldwide that has gone hand in hand with the development of the conference itself and of related activities by its underpinning community. Since the second edition, organized by IRIT in Toulouse in 2008, plenary invited talks by world-leading researchers and a software demonstration session became an integral part of the conference programme. The third edition, organized in 2010 by the University of Brescia in Desenzano del Garda, saw the addition of a best student paper award. The same year, the new journal Argument and Computation, closely related to the COMMA activities, was started. Since the fourth edition, organized by the Vienna University of Technology in 2012, an Innovative Application Track and a section for Demonstration Abstracts were included in the proceedings. At the fifth edition, co-organized in 2014 by the Universities of Aberdeen and Dundee in Pitlochry, the main conference was preceded by the first Summer School on Argumentation: Computational and Linguistic Perspectives. The same year also saw the launch of the first International Competition on Computational Models of Argumentation (ICCMA). Since COMMA 2016, hosted by the University of Potsdam, the COMMA proceedings have been Open Access. This COMMA was also the first that included additional satellite workshops in the programme. COMMA 2018 was hosted by the Institute of Philosophy and Sociology of the Polish National Academy of Sciences in Warsaw, Poland. It included an industry afternoon bringing together businesses, NGOs, academics and students interested in practical applications of argument technologies in industry. COMMA 2020 was organised in Italy for the second time, by the University of Perugia, but, due to the COVID pandemic, was run fully online. It was preceded by the 4th Summer School on Argumentation: Computational and Linguistic Perspectives (SSA 2020), and featured a demonstrations session and three satellite workshops: the International Workshop on Systems and Algorithms for Formal Argumentation (SAFA), initiated at COMMA 2016; a new Workshop on Argument Visualization. 2020 also saw the peripatetic workshop Computational Models of Natural Argument, CMNA, which is one of the longest-running events in the community, and has previously colocated with conferences such as IJCAI, ECAI and ICAIL, join COMMA for its 20th edition. COMMA 2022 was once again an in-person event, for the third time in the UK, but this time in hosted by Cardiff University. It was preceded by the 5th Summer School on Argumentation, with a focus on “Explainability Perspective” and by four workshops: CMNA 2022, the 21st Workshop on Computational Models of Natural Argument; SAFA 2022, the 4th International Workshop on Systems and Algorithms for Formal Argumentation; as well as ArgXAI 2022, the 1st International Workshop on Argumentation for eXplainable AI, and ArgML 2022, the 1st International Workshop on Argumentation & Machine Learning.
For its tenth edition in 2024, the COMMA conference returns for the second time to Germany, this time to FernUniversit´lat in Hagen. We are delighted to be able to present an exciting programme of invited talks, long paper presentations (including some in the innovative applications track) and demonstrations. Our invited talks come from three leading lights in argumentation. Leila Amgoud is Research Director (DR1) at CNRS (Centre National de la Recherche Scientifique), Deputy Director of IRIT (Toulouse Institute of Computer Science) since January 2021, Member of the ADRIA team at IRIT, EurAI Fellow since 2014, and PI of an ANITI chair on argumentation. She has been working on formal models of argumentation since the 1990s and is drawing upon that track record to talk to us about different classifications of argumentation semantics. Ulrike Hahn is professor of psychology at the School of Psychological Sciences at Birkbeck, University of London, where she is Director of the Centre for Cognition, Computation and Modelling. One of the UK’s most prominent psychologists, she has a long track record of research at the boundary between cognitive science and argumentation with a particular focus on Bayesian argumentation and notions of argument quality, which form the central themes of her keynote. Serena Villata is a senior researcher (Directrice de recherche) in computer science at the CNRS, Chair in Artificial Intelligence at the Interdisciplinary Institute for Artificial Intelligence 3IA Côte d’Azur and Deputy Scientific Director of the 3IA Côte d’Azur Institute. Her work at the interface between computational models of argument and computational linguistics, and argument mining in particular, has established her as a leading figure in the field, from which vantage point she reviews themes of argument quality, fallaciousness and trustworthiness. Setting the tone of the conference, the invited talks are just one part of a strong programme. From the 63 submissions we received, we accepted 26 full papers and with a further 12 accepted as demos with an accompanying extended abstract (2 pages), and 2 accepted as a full paper with an accompanying demo. We would like to thank our programme committee of 83 members drawn from 18 countries and supplemented by a further 10 additional reviewers, who have worked to ensure a selective and robust quality in our programme. In addition to the main programme, the conference also benefits as usual from a range of satellite events. The now well-established summer school returns, with its 6th edition covering tutorials from academic and industrial leaders in the area, and including again a session from the Online Handbook of Argumentation for AI, OHAAI, as well as the doctoral consortium. The conference is also complemented by three co-located workshops: the 2nd International Workshop on Argumentation for eXplainable AI (ArgXAI 2024); the 5th International Workshop on Systems and Algorithms for Formal Argumentation (SAFA 2024); and the 24th Workshop on Computational Models of Natural Argument (CMNA 2024). This diverse and intellectually stimulating programme is due to the hard work not only of the programme committee and reviewers, but also of the workshop chair, Sarah Gaggl; summer school chairs, AnneMarie Borg and Jesse Heyninck; and Elfia Bezou Vrakatseli and Andreas Xydis from the OHAAI committee. Of course the conference would not exist at all without the COMMA steering committee; the proceedings would not exist at all without the work of Eimear Maguire who has helped tirelessly with assembly; and the conference would not run at all without the superb local organising committee in Hagen. To all of these, we would like to extend our deepest gratitude.
Finally, it is with a heavy heart that we note that this is the first COMMA at which we shall not be joined by one of the community’s most widely published, most extensively cited and most highly respected scholars. Trevor Bench-Capon passed away on 20th May 2024 and as colleague, collaborator, mentor or friend to many of us, he will be sorely missed. As a way of marking his role in founding, supporting and driving forward both the community and the COMMA conference itself, we will be naming the best student paper prize in his honour: the Trevor Bench-Capon Best Student Paper Award. Given Trevor’s commitment to unwavering support and encouragement of research students, and his natural ability to inculcate the highest standards of research excellence, it is fitting that the Trevor Bench-Capon prize will recognise the very best of student research in the field.
Chris Reed (Programme chair)
Matthias Thimm (Local chair)
Tjitze Rienstra (Demo chair)
Dundee, Hagen and Maastricht, July 2024
A Bayesian approach to argumentation has, arguably, made great strides in illuminating long-standing questions about argument quality. In particular, the Bayesian framework allows nuanced evaluation of content-based differences in argument strength for individual arguments –both for arguments about facts and for practical arguments. It also provides a principled approach to summary evaluation in contexts of multiple, both mutually supporting and competing arguments, in final reckoning. What it has not done, however, is contribute systematically to an understanding of the dialectical process of argumentation, either in dyads or across large collectives. The talk reviews the literature on argument quality, but then focusses on recent work within the Bayesian framework to address this dialectical challenge.
Argument(ation) Mining (AM) is the dimension of computational argumentation aiming at automatically processing natural language arguments and reason upon them. More precisely, argument mining aims at extracting, classifying and analysing natural language arguments and their relations from text, with the final goal of providing machine-processable structured data for computational models of argument. In this keynote talk, I will first introduce this research area, highlighting the main successful tasks and open issues in identifying argumentative structures from different kinds of texts (e.g., clinical trials, online user generated content, news articles). Then, I will present a key challenge which conjugate argument mining with formal computational models of argumentation, i.e., the assessment of the trustworthiness of natural language arguments, with a focus on fallacious argumentation, with the aim to show how these methods can be used to automatically identify fallacious arguments in political debates. Fallacies play a prominent role in argumentation since antiquity due to their contribution to argumentation in critical thinking education. Their role is even more crucial nowadays as contemporary argumentation technologies face challenging tasks as misleading and manipulative information detection in news articles and political discourse, and counter-narrative generation. I will conclude with some thoughts on the challenge of automatic generation of counter-arguments to fight online disinformation and hate speech.
Product reviews represent a valuable source of information for both (potential) customers and sellers. Usually, reviews come in pairs (score, motivation), where the motivation is a piece of unstructured text explaining the score given to a product. For reviews, this setting is ideal to combine a quantitative assessment of a product with a qualitative explanation. Aggregating the numerical scores might be uninformative while parsing large quantities of text might be challenging.
Automated argument analysis can help in this process, and we previously developed an argument-based quality analysis pipeline that helps identify the most significant items from a corpus of reviews. Given that the pipeline is effective but time-consuming, this work sets out to improve its computational efficiency. Next to optimisation by conventional methods, we investigate the effect of reducing the number of text chunks that are used to build the argumentation graph.
We find that conventional methods significantly improve the computation time, which allows us to analyse much larger datasets of real-world reviews. When the number of tokens is scaled down, accuracy remains similar compared to the original version of the pipeline. However, we find that this does not necessarily result in a computation time reduction.
Various approaches have been proposed for providing efficient computational approaches for abstract argumentation. Among them, neural networks have permitted to solve various decision problems, notably related to arguments (credulous or skeptical) acceptability. In this work, we push further this study in various ways. First, relying on the state-of-the-art approach AFGCN, we show how we can improve the performances of the Graph Convolutional Networks (GCNs) regarding both runtime and accuracy. Then, we show that it is possible to improve even more the efficiency of the approach by modifying the architecture of the network, using Graph Attention Networks (GATs) instead.
In this paper, we introduce a novel Argument Mining task based on the existing task of Argument Structure Parsing (ASP). Our new task, which we call ASG Parsing, is the task of generating Argument Summary Graphs (ASGs) from dialogical argumentative text. We release a dataset containing ASGs, a type of graphical summary for argumentative dialogues, in which the nodes are summaries of statements and the edges are the argumentative relations between them (support or attack). We approach the problem with two different LLM-based solutions: (a) a pipeline system involving two models separately fine-tuned for summarisation and stance detection; and (b) an end-to-end system based on the TANL (Translation between Augmented Natural Languages) framework [1]. We show that the TANL approach outperforms the pipeline approach across the board. We also show that, for all systems, performance degrades as the depth of the graphs increases.
Dialectical Classical Logic Argumentation (D-Cl-Arg) formalises maxi-consistent non-monotonic reasoning under the practical assumption that agents have bounded resources for classical inference, and that agents do not typically check arguments’ premises for subset minimality and consistency. However, D-Cl-Arg still satisfies all rationality postulates. Moreover D-Cl-Arg accommodates uses of argument characteristic of dialectical practice. This paper extends D-Cl-Arg to accommodate further dialectical uses of argument; in particular unrestricted rebuts on the deductively derived conclusions of arguments, and Occam Razor defeats that dialectically demonstrate that an argument makes use of redundant premises. We show that all rationality postulates are still satisfied, while relaxing constraints on preference relations that were previously required to prove rationality.
An important open challenge in the area of computational argumentation is the automatic reconstruction of natural language enthymemes. Such argumentative figures are commonly used in natural language human discourse to improve the naturalness and efficiency of speech. They also represent a major challenge when developing computational argumentation systems that need to work with natural language data, since enthymemes bring irregularity to the representations proposed in classical models of argumentation. In this paper, we propose a new framework based on the theory of argumentation schemes aimed at automatically reconstructing natural language enthymemes. The proposed framework consists of a two-module pipeline: (i) scheme classification, and (ii) enthymeme reconstruction. We validate the proposed framework by comparing its performance to a baseline pipeline that does not take the argumentation scheme theory into account. We evaluate the framework by analysing the validity of the complete reconstructed arguments, establishing a new set of baselines that can be used as reference for future work in this direction.
We evaluate two large language models (LLMs) ability to perform argumentative reasoning. We experiment with argument mining (AM) and argument pair extraction (APE), and evaluate the LLMs’ ability to recognize arguments under progressively more abstract input and output (I/O) representations (e.g., arbitrary label sets, graphs, etc.). Unlike the well-known evaluation of prompt phrasings, abstraction evaluation retains the prompt’s phrasing but tests reasoning capabilities. We find that scoring-wise the LLMs match or surpass the SOTA in AM and APE, and under certain I/O abstractions LLMs perform well, even beating chain-of-thought–we call this symbolic prompting. However, statistical analysis on the LLMs outputs when subject to small, yet still human-readable, alterations in the I/O representations (e.g., asking for BIO tags as opposed to line numbers) showed that the models are not performing reasoning. This suggests that LLM applications to some tasks, such as data labelling and paper reviewing, must be done with care.
Already in Dung’s seminal paper introducing Abstract Argumentation Frameworks (AFs), several connections to seemingly unrelated reasoning formalisms have been illustrated. In this work, we continue this trend and establish a connection between abstract argumentation frameworks and boolean networks (BNs). BNs, in a nutshell, mimic simple binary-valued systems, where for each point in time, the value of each bit (component) depends only on the other components’ values of the previous point in time of the network. This formalism is widely used to formally analyze biological processes, where from simple rules complex behavior emerges. We show that stable extensions of an arbitrary AF correspond to single state attractors of its canonically corresponding BN, the complete extensions correspond to a distinctive 2-state attractor, and the admissible sets correspond to the seeds of the BN. We thereby lay the groundwork for a fruitful exchange of ideas between the two research areas.
Proposals for strategies for dialogical argumentation often focus on situations where one of the agents wins the dialogue and the other agent loses. Yet in real-world argumentation, it is common for agents to not view a dialogue as a zero-sum game. Rather, the agents may enter into a dialogue with divergent but not diametrically opposing views on what is important (e.g. a doctor trying to persuade a patient to give up smoking when the patient would like to be healthy but gets some pleasure from smoking). Furthermore, there may be multiple persuasion goals (e.g. reduce smoking of cigarettes to 20 per day, or 10 per day, or 5 per day, or 1 per day, or 0 per day, where the doctor prefers 0 per day most, and 20 per day least, whereas the patient might prefer 5 per day most, and 20 per day and 0 per day least). In order to develop persuasive chatbots that support this kind of behaviour change application, this paper presents dialogue protocols, and a strategy for a chatbot, to optimize choice of moves.
Abstract dialectical frameworks (ADFs) have been introduced as a formalism for modeling and evaluating argumentation, allowing for general logical acceptance conditions of arguments. Different criteria used to settle the acceptance of arguments are called semantics. Two-valued semantics of ADFs reflect the ‘black-and-white’ character of classical logic in non-monotonic frameworks. Stable semantics of ADFs were introduced to exclude cycles of self-justification of arguments among two-valued models. The stable semantics faces the challenge of potential non-existence of stable models. However, one might still want to draw conclusions even in case that an ADF has no two-valued models or stable models. Recently, the notions of semi-two-valued semantics and semi-stable semantics were introduced for ADFs. In the current work, we study the computational complexity of these two novel semantics. We show that the complexity of the semi-stable semantics is in general one level up in the polynomial hierarchy, compared to the stable semantics. We study the prominent reasoning tasks of credulous and skeptical reasoning, as well as the verification problem.
Most existing computational tools for assumption-based argumentation (ABA) focus on so-called flat frameworks, disregarding the more general case. Here, we study an instantiation-based approach for reasoning in possibly non-flat ABA. For complete-based semantics, an approach of this kind was recently introduced, based on a semantics-preserving translation between ABA and bipolar argumentation frameworks (BAFs). Admissible semantics, however, require us to consider an extension of BAFs which also makes use of premises of arguments (pBAFs). We explore basic properties of pBAFs which we require as a theoretical underpinning for our proposed instantiation-based solver for non-flat ABA under admissible semantics. As our empirical evaluation shows, depending on the ABA instances, the instantiation-based solver is competitive against an ASP-based approach implemented in the style of state-of-the-art solvers for hard argumentation problems.
Abstract argumentation is an important research area in AI. It is mainly about the acceptability of arguments in an argumentation framework. The classical notion of defense has not fully reflected some useful information implicitly encoded by the interaction relation between arguments. In this paper, instead of using arguments and attacks as first citizens, a novel notion of attack-defense is adopted as a first citizen, based on which a theory of attack-defense framework and attack-defense semantics are established, where an attack-defense is a triple (x,y,z), meaning that: an argument x defends an argument z against an attacker y. Attack-defense semantics can be used not only to identify the impact of arguments in some odd cycles, and remove some “useless” defenses, but also to capture new types of equivalence that cannot be represented by the existing notions of equivalence of argumentation frameworks. In addition, it shows that an attack-defense framework and attack-defense semantics can represent some knowledge that cannot be represented in Dung-style argumentation, e.g., some context-sensitive knowledge in a dialogue.
This paper develops a measure of the influence of individual arguments in abstract argumentation frameworks. By applying ideas from power indices in coalitional game theory, the proposed measure—called admissibility impact value—quantifies the impact that individual arguments have on the set of admissible extensions of a given argumentation framework. It improves on existing impact measures in that it is more fine-grained and sensitive to small differences in the attack relations of argumentation frameworks. Special consideration is given to well-founded frameworks, where the improvements are particularly pronounced.
We consider scenarios where a group of agents wish to simplify a given abstract argumentation framework—specifying a set of arguments and the attacks between them—by eliminating cycles in the attack-relation on the basis of their preferences over arguments. They do so by first aggregating their individual preferences into a collective preference order and then removing any attacks involved in a cycle that go against that order. Our analysis integrates insights from formal argumentation and social choice theory. We obtain sweeping impossibility results for essentially all standard methods of preference aggregation, showing that no Condorcet method and no positional scoring rule can uphold the fundamental principle expressing that views held by every single member of the group must be respected. But we also find that so-called representative-agent rules do offer this guarantee.
Online reviews now have a considerable influence on consumer choices. However, little work has focused on what features of review platforms influence review quality. We present a novel approach to identify the features that encourage quality reviews. By interpreting reviews as arguments for or against the product, an argument scheme can be used to simulate the emergent reliability of reviews resulting from different setups of the online review platform. Our results show that if the most recent, helpful, or polarised reviews are promoted over quality, then good quality reviews will almost never be shown to users.
Incomplete argumentation frameworks (IAFs) are abstract argumentation frameworks that encode qualitative uncertainty by distinguishing between certain and uncertain arguments and attacks. In a completion of an IAF, each uncertain argument or attack is either added (made certain) or removed. Given a completion, the acceptability of an argument is determined by its justification status. For arguments in an IAF that do not have the same justification status in each completion, it is interesting to study which uncertain arguments and attacks are relevant, in the sense that adding or removing them can lead to a different justification status. We propose algorithms based on Answer Set Programming for enumerating relevant arguments and attacks under grounded and complete semantics.
Dialogue protocols define how a dialogue may proceed and the moves its participants can make within it. The scope of this work covers protocol switching, and addresses the current gap in the area of illicit protocol switches in dialogue. Over the course of a dialogue the participants’ goals and strategies may change in response to the other participants within the dialogue. Enabling agents to switch between protocols gives them the flexibility to address these changes and make use of them. This paper introduces protocol switching using Dialogue as a Service (DaaS), a platform for building multi-agent dialogue systems. DaaS can be used to create a wide range of multi-agent dialogue systems due to its few restrictions and inherent flexibility, which is illustrated through the use of two examples from the literature. Protocol switches can be both licit and illicit; however, current research has only focused on implementing licit protocol switches. An approach to facilitating and managing illicit protocol switches is demonstrated herein.
Truth discovery networks evaluate the trustworthiness of sources (e.g., websites) and their claims (e.g., the severity of a virus). Intuitively, the more trustworthy the sources of a claim, the more believable the claim and vice versa. Singleton noted that bipolar abstract argumentation could be a natural way to reason about these networks. We explain how this idea can be implemented naturally by quantitative bipolar argumentation frameworks (QBAFs) that we call TD-QBAFs. While most applications of QBAFs result in a (nearly) acyclic structure, TD-QBAFs have bi-directional edges and can feature complex cycles. The stability (convergence behaviour) of QBAFs in cyclic graphs is currently not well understood. While pathological examples of divergent QBAFs have been constructed, the problems seemed unlikely to occur in practice. However, convergence problems seem to be the rule rather than the exception for TD-QBAFs. We demonstrate how common QBAF semantics can fail to converge for very simple TD-QBAFs and discuss some of the potential causes. While this shows limitations of existing semantics, we also discuss how some previously proposed ideas can be used to mitigate the problems and demonstrate their effectiveness empirically.
The field of explainable AI has grown exponentially in recent years. Within this landscape, argumentation frameworks have shown to be helpful abstractions of some AI models towards providing explanations thereof. While existing work on argumentative explanations and their properties has focused on static settings, we focus on dynamic settings whereby the (AI models underpinning the) argumentation frameworks need to change. Specifically, for a number of notions of explanations drawn from abstract argumentation frameworks under extension-based semantics, we address the following questions: (1) Are explanations robust to extension-preserving changes, in the sense that they are still valid when the changes do not modify the extensions? (2) If not, are these explanations pseudo-robust in that can be tractably updated? In this paper, we frame these questions formally. We consider robustness and pseudo-robustness w.r.t. ordinary and strong equivalence and provide several results for various extension-based semantics.
In this study, we introduce KIALOPRIME, a novel large-scale dataset comprising 5,687 argument discussion graphs with a total of 1,088,801 of supporting, attacking, and neutral argument relations, derived from the structured debates of the online discussion platform Kialo.com. This dataset facilitates in-depth analysis of argument structures and the dynamics of discourse, serving as a substantial resource for computational argumentation research. We explore argument inference through traditional sequence classification and a modern generative reasoning based approach, employing an open-source mixture of experts LLM to interpret and enrich each argument pair with high-quality synthetic elaborations about the argumentative interaction. We achieve baseline results of F1 .899 and .840 within discussions and F1 .908 and .840 across discussions for the argument relation and elaboration classification models, respectively. While the elaboration-based model scores slightly lower on the classification task, we highlight areas of improvement to better capture the hidden complexities of argumentative text. These initial findings are promising as they not only establish robust benchmarks for future studies but also demonstrate the potential for using generative reasoning to provide a more insightful analysis of argument relations.