Ebook: HHAI 2023: Augmenting Human Intellect
Artificial intelligence (AI) has been much in the news recently, with some commentators expressing concern that AI might eventually replace humans. But many developments in AI are designed to enhance and supplement the performance of humans rather than replace them, and a novel field of study, with new approaches and solutions to the development of AI, has arisen to focus on this aspect of the technology.
This book presents the proceedings of HHAI2023, the 2nd International Conference on Hybrid Human-Artificial Intelligence, held from 26-30 June 2023, in Munich, Germany. The HHAI international conference series is focused on the study of artificially intelligent systems that cooperate synergistically, proactively, responsibly and purposefully with humans, amplifying rather than replacing human intelligence, and invites contributions from various fields, including AI, human-computer interaction, the cognitive and social sciences, computer science, philosophy, among others. A total of 78 submissions were received for the main conference track, and most papers were reviewed by at least three reviewers. The overall final acceptance rate was 43%, with 14 contributions accepted as full papers, 14 as working papers, and 6 as extended abstracts. The papers presented here cover topics including interactive hybrid agents; hybrid intelligence for decision support; hybrid intelligence for health; and values such as fairness and trust in hybrid intelligence. We further accepted 17 posters and 4 demos as well as 8 students to the first HHAI doctoral consortium this year. The authors of 4 working papers and 2 doctoral consortium submissions opted for not publishing their submissions to allow a later full submission, resulting in a total of 57 papers included in this proceedings.
Addressing all aspects of AI systems that assist humans and emphasizing the need for adaptive, collaborative, responsible, interactive, and human-centered artificial intelligence systems which can leverage human strengths and compensate for human weaknesses while considering social, ethical, and legal considerations, the book will be of interest to all those working in the field.
With artificial intelligence (AI) systems entering our working and leisure environments with increasing adaptation and learning capabilities, new opportunities arise for developing hybrid (human-AI) intelligence (HI) systems, comprising new ways of collaboration. However, there is not yet a structured way of specifying design solutions of collaboration for hybrid intelligence (HI) systems and there is a lack of best practices shared across application domains. We address this gap by investigating the generalization of specific design solutions into design patterns that can be shared and applied in different contexts. We present a human-centered bottom-up approach for the specification of design solutions and their abstraction into team design patterns. We apply the proposed approach for 4 concrete HI use cases and show the successful extraction of team design patterns that are generalizable, providing re-usable design components across various domains. This work advances previous research on team design patterns and designing applications of HI systems.
Automatically assigning tasks to people is challenging because human performance can vary across tasks for many reasons. This challenge is further compounded in real-life settings in which no oracle exists to assess the quality of human decisions and task assignments made. Instead, we find ourselves in a “closed” decision-making loop in which the same fallible human decisions we rely on in practice must also be used to guide task allocation. How can imperfect and potentially biased human decisions train an accurate allocation model? Our key insight is to exploit weak prior information on human-task similarity to bootstrap model training. We show that the use of such a weak prior can improve task allocation accuracy, even when human decision-makers are fallible and biased. We present both theoretical analysis and empirical evaluation over synthetic data and a social media toxicity detection task. Results demonstrate the efficacy of our approach.
Interactive machine learning (ML) adds a human-in-the-loop aspect to a ML system. Even though the input from human users to the system is a central part of the concept, the uncertainty caused by the human feedback is often not considered in interactive ML. The assumption that the human user is expected to always provide correct feedback, typically does not hold in real-world scenarios. This is especially important for when the cognitive workload of the human is high, for instance in online learning from streaming data where there are time constraints for providing the feedback. We present experiments of interactive online ML with human participants, and compare the results to simulated experiments where humans are always correct. We found combining the two interactive learning paradigms, active learning and machine teaching, resulted in better performance compared to machine teaching alone. The results also showed an increased discrepancy between the experiments with human participants and the simulated experiments when the cognitive workload was increased. The findings suggest the importance of taking uncertainty caused by human factors into consideration in interactive ML, especially in situations which requires a high cognitive workload for the human.
In AI-assisted decision-making, a central promise of putting a human in the loop is that they should be able to complement the AI system by adhering to its correct and overriding its mistaken recommendations. In practice, however, we often see that humans tend to over- or under-rely on AI recommendations, meaning that they either adhere to wrong or override correct recommendations. Such reliance behavior is detrimental to decision-making accuracy. In this work, we articulate and analyze the interdependence between reliance behavior and accuracy in AI-assisted decision-making, which has been largely neglected in prior work. We also propose a visual framework to make this interdependence more tangible. This framework helps us interpret and compare empirical findings, as well as obtain a nuanced understanding of the effects of interventions (e.g., explanations) in AI-assisted decision-making. Finally, we infer several interesting properties from the framework: (i) when humans under-rely on AI recommendations, there may be no possibility for them to complement the AI in terms of decision-making accuracy; (ii) when humans cannot discern correct and wrong AI recommendations, no such improvement can be expected either; (iii) interventions may lead to an increase in decision-making accuracy that is solely driven by an increase in humans’ adherence to AI recommendations, without any ability to discern correct and wrong. Our work emphasizes the importance of measuring and reporting both effects on accuracy and reliance behavior when empirically assessing interventions.
Artificial Intelligence (AI) offers organizations unprecedented opportunities. However, one of the risks of using AI is that its outcomes and inner workings are not intelligible. In industries where trust is critical, such as healthcare and finance, explainable AI (XAI) is a necessity. However, the implementation of XAI is not straightforward, as it requires addressing both technical and social aspects. Previous studies on XAI primarily focused on either technical or social aspects and lacked a practical perspective. This study aims to empirically examine the XAI related aspects faced by developers, users, and managers of AI systems during the development process of the AI system. To this end, a multiple case study was conducted in two Dutch financial services companies using four use cases. Our findings reveal a wide range of aspects that must be considered during XAI implementation, which we grouped and integrated into a conceptual model. This model helps practitioners to make informed decisions when developing XAI. We argue that the diversity of aspects to consider necessitates an XAI “by design” approach, especially in high-risk use cases in industries where the stakes are high such as finance, public services, and healthcare. As such, the conceptual model offers a taxonomy for method engineering of XAI related methods, techniques, and tools.
Automated decision-making is one of the fundamental functions of smart home technologies. With the increasing availability of Artificial Intelligence (AI) and Internet of Things (IoT) technologies, those functions are becoming increasingly sophisticated. While many studies have been conducted on optimizing algorithms to improve the accuracy of predictions, less attention has been paid to how humans interact with algorithmic systems. This involves questions such as to what degree humans are involved in the algorithmic decision-making process and how we can design meaningful interactions between humans and systems relying on decision-making algorithms. With these questions in mind, our paper presents a literature review on the current state of decision-making algorithms in smart homes. Based on an analysis of 49 selected papers, we present a systematic investigation towards the application areas and the deployment functions that decision-making algorithms currently take in smart homes. Focusing on two main application areas – energy management and healthcare, our paper sheds light on the current deployment of decision-making algorithms in smart homes and identifies the current intentions of involving humans in-the-loop. Within the background of facilitating human-in-the-loop as an interaction paradigm, we aim to expose the design challenges for human-in-the-loop decision-making algorithms in smart homes which can pave the way for developing more effective human-machine hybrid intelligent systems in smart homes in the future.
For personal assistive technologies to effectively support users, they need a user model that records information about the user, such as their goals, values, and context. Knowledge-based techniques can model the relationships between these concepts, enabling the support agent to act in accordance with the user’s values. However, user models require updating over time to accommodate changes and continuously align with what the user deems important. In our work, we propose and investigate the use of human-agent alignment dialogues for establishing whether user model updates are needed and acquiring the necessary information for these updates. In this paper, we perform an exploratory qualitative focus group study in which we investigate participants’ opinions about written examples of alignment dialogues, as a foundation for their design. Transcripts were analyzed using thematic analysis. A main theme that emerged concerns the potential impact of agent utterances on the user’s feelings about themselves and about the agent.
The emergence of generative design (GD) has introduced a new paradigm for co-creation between human experts and AI systems. Empirical findings have shown promising outcomes such as augmented human cognition and highly creative design products. Barriers still remain that prevent individuals from perceiving and adopting AI, entering into collaboration with AI and sustaining it over time. It is even more challenging for creative design industries to adopt and trust AI where these professionals value individual style and expression, and therefore require highly personalized and specialized AI assistance. In this paper, we present a holistic hybrid intelligence (HI) approach for individual experts to train and personalize their GD assistants on the fly. Our contribution to human-AI interaction is three-fold including i) a programmable common language between human and AI to represent the expert’s design goals to the generative algorithm, ii) a human-centered continual training loop to seamlessly integrate AI-training into the expert’s task workflow, iii) a hybrid intelligence narrative to address the psychological willingness to spend time and effort training such a virtual assistant. This integral approach enables individuals to directly communicate design goals to AI and seeks to create a psychologically safe space for adopting, training and improving AI without the fear of job-replacement. We concertize these constructs through a newly developed Hybrid Intelligence Technology Acceptance Model (HI-TAM). We used mixed methods to empirically evaluate this approach through the lens of HI-TAM with 8 architectural professionals working individually with a GD assistant to co-create floor plan layouts of office buildings. We believe that the proposed approach enables individual professionals, even non-technical ones, to adopt and trust AI-enhanced co-creative tools.
Human knowledge is growing exponentially, providing huge and sometimes contrasting evidence to support decision making in the realm of complex problems. To fight knowledge fragmentation, collective intelligence leverages groups of experts (possibly from diverse domains) that jointly provide solutions. However, to promote beneficial outcomes and avoid herding, it is necessary to (i) elicit diverse responses and (ii) suitably aggregate them in a collective solution. To this end, AI can help with dealing with large knowledge bases, as well as with reasoning on expert-provided knowledge to support decision-making. A hybrid human-artificial collective intelligence can leverage the complementarity of expert knowledge and machine processing to deal with complex problems. We discuss how such a hybrid human-artificial collective intelligence can be deployed to support decision processes, and we present case studies in two different domains: general medical diagnostics and climate change adaptation management.
Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. However, the complexity of DNNs makes it difficult to understand how they solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience that analyze complex and opaque systems. Here, we draw inspiration from how neuroscience uses topographic maps to visualize brain activity. To also visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space such that neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare methods to obtain a topographic layout of neurons in a DNN layer. Moreover, we demonstrate how to use topographic activation maps to identify errors or encoded biases and to visualize training processes. Our novel visualization technique improves the transparency of DNN-based decision-making systems and is interpretable without expert knowledge in Machine Learning.
Designing cooperative AI-systems that do not automate tasks but rather aid human cognition is challenging and requires human-centered design approaches. Here, we introduce AI-aided brainstorming for solving guesstimation problems, i.e. estimating quantities from incomplete information, as a testbed for human-AI interaction with large language models (LLMs). In a think-aloud study, we found that humans decompose guesstimation questions into sub-questions and often replace them with semantically related ones. If they fail to brainstorm related questions, they often get stuck and do not find a solution. Therefore, to support this brainstorming process, we prompted a large language model (GPT-3) with successful replacements from our think-aloud data. In follow-up studies, we tested whether the availability of this tool improves participants’ answers. While the tool successfully produced human-like suggestions, participants were reluctant to use it. From our findings, we conclude that for human-AI interaction with LLMs to be successful AI-systems must complement rather than mimic a user’s associations.
Changing one’s behavior is difficult, so many people look towards technology for help. However, most current behavior change support systems are inflexible in that they support one type of behavior change and do not reason about how that behavior is embedded in larger behavior patterns. To allow users to flexibly decide what they desire to change, a system needs to represent and reason about that desire. Moreover, we argue that reasoning about the context of a behavior could improve an agent’s support. Therefore, we propose a formal framework for a reasoning agent to represent and reason about the personal behavioral context of desired user changes. This framework models an individual’s possible and current behavior, their desire for change, as well as other relevant changes that a system could use to support a desired change. In a user survey we show that people feel these other relevant changes would be useful in more flexibly supporting their desired change in behavior. This work provides a foundation for more flexible personalized behavior change support.
Machine Learning (ML) decision-making algorithms are now widely used in predictive decision-making, for example, to determine who to admit and give a loan. Their wide usage and consequential effects on individuals led the ML community to question and raise concerns on how the algorithms differently affect different people and communities. In this paper, we study fairness issues that arise when decision-makers use models (proxy models) that deviate from the models that depict the physical and social environment in which the decisions are situated (intended models). We also highlight the effect of obstacles on individual access and utilization of the models. To this end, we formulate an Equity Framework that considers equal access to the model, equal outcomes from the model, and equal utilization of the model, and consequentially achieves equity and higher social welfare than current fairness notions that aim for equality. We show how the three main aspects of the framework are connected and provide questions to guide decision-makers towards equitable decision-making. We show how failure to consider access, outcome, and utilization would exacerbate proxy gaps leading to an infinite inequity loop that reinforces structural inequities through inaccurate and incomplete ground truth curation. We, therefore, recommend a more critical look at the model design and its effect on equity and a shift towards equity achieving predictive decision-making models.
Conversational agents (CAs, aka chatbots) for behavioral interventions have great potential to improve patient engagement and provide solutions that can benefit human health. In this study, we examined the potential efficacy of chatbots in assisting with the resolution of specific barriers that people frequently encounter when doing behavioral interventions for the purpose of increasing physical activity (PA). To do this, six common barriers (i.e., things that stand in the way of increasing PA) were targeted (e.g., stress and fatigue), we adopted domain knowledge (i.e., psychological theories and behavioral change techniques) to design six interventions aimed at tackling each of these six barriers. These interventions were then incorporated into consultative conversations, which were subsequently integrated into a chatbot. A user study was conducted on non-clinical samples (n=77) where all participants were presented with three randomly but equally distributed chatbot interventions and a control condition. Each intervention conversation addressed a specific barrier to PA, while the control conversation did not address any barrier. The outcome variables were beliefs in PA engagement, attitudes toward the effectiveness of each intervention to resolve the barrier, and the overall chatbot experience. The results showed a significant increase in beliefs of PA engagement in most intervention groups compared to the control group, and positive attitudes toward the effectiveness of the interventions in reducing their respective barriers to PA, and positive chatbot experience. The results demonstrate that theory-grounded interventions delivered by chatbots can effectively help people overcome specific barriers to PA, thereby increasing their beliefs in PA engagement. These promising findings indicate that chatbot interventions can be an accessible and widely applicable solution for a larger population to promote PA.
In many practical applications, machine learning models are embedded into a pipeline involving a human actor that decides whether to trust the machine prediction or take a default route (e.g., classify the example herself). Selective classifiers have the option to abstain from making a prediction on an example they do not feel confident about. Recently, the notion of the value of a machine learning model has been introduced as a way to jointly consider the benefit of a correct prediction, the cost of an error, and that of abstaining. In this paper, we study how active learning of selective classifiers is affected by the focus on value. We show that the performance of the state-of-the-art active learning strategies drops significantly when we evaluate them based on value rather than accuracy. Finally, we propose a novel value-aware active learning strategy that outperforms the state-of-the-art ones when the cost of incorrect predictions substantially outweighs that of abstaining.
We investigate how the use of haptic feedback through electrical muscle stimulation (EMS) can improve collision-avoidance in a robot teleoperation scenario.
Background:
Collision-free robot teleoperation requires extensive situation awareness by the operator. This is difficult to achieve purely visually when obstacles can exist outside of the robot’s field of view. Therefore, feedback from other sensory channels can be beneficial.
Method:
We compare feedback modalities in the form of auditory, haptic and bi-modal feedback, notifying users about incoming obstacles outside their field of view, and moving their arms in the direction to avoid the obstacle. We evaluate the different feedback modalities alongside a unimodal visual feedback baseline in a user study (unmapped: inline-formula unmapped: math unmapped: mi Nunmapped: mo =unmapped: mn 9), where participants are controlling a robotic arm in a virtual reality environment. We measure objective performance metrics in terms of the number of collisions and errors, as well as subjective user feedback using the NASA-TLX and the short version of the User Experience Questionnaire.
Findings:
Unimodal EMS and bi-modal feedback outperformed the baseline and unimodal auditory feedback when it comes to hedonic user experience (unmapped: inline-formula unmapped: math unmapped: mi punmapped: mo <unmapped: mn .001). EMS outperformed the baseline with regards to pragmatic user experience (unmapped: inline-formula unmapped: math unmapped: mi punmapped: mo =unmapped: mn .018). We did not detect significant differences in the performance metrics (collisions and errors). We measured a strong learning effect when investigating the collision count and time.
Key insights:
The use of EMS is promising for this task. Two of the nine participants reported to experience some level of discomfort. The modality is best utilized for nudging rather than extended movement.
Confidence signals are often used in human interactions to communicate the likelihood of a decision being correct. Similarly, confidence may also be used to indicate the reliability of advice given by an AI. While previous work on explainable AI (XAI) has explored the effect of AI confidence on AI-advice adoption and joint accuracy of the human-AI team, most studies use AI-assistants that exceed human performance. It is unclear how displaying the confidence interacts with the accuracy of the AI. We conduct a comprehensive investigation of the effect of displaying AI confidence on two factors: 1) the accuracy of AI-assisted decision making, and 2) reliance on the AI’s assistance. We conduct two behavioral experiments, one where participants were shown AI confidence, and another where no confidence ratings were shown. Our work goes beyond the typical focus on high accuracy AI assistants. In both experiments, participants were assisted by one of three AI classifiers of varying accuracy. Our results demonstrate that displaying AI confidence increases joint accuracy when people are assisted by a classifier that is better than humans on average. Conversely, when assisted by a classifier with performance worse than an average human, joint accuracy was better when no AI confidence was displayed. However, for the adoption of AI advice we observed the opposite pattern: people rely more on a higher accuracy classifier that does not display confidence compared to one that does, and people rely more on a lower accuracy classifier that does display AI confidence compared to one that does not.
e-Health data is sensitive and consenting to the collection, processing, and sharing involves compliance with legal requirements, ethical standards, and appropriate digital tools. We explore two legal-ethical challenges: 1) What are the scope and requirements of digital health data consent? 2) What are the legal-ethical reasons for obtaining consent beyond the GDPR’s legal basis, and how might such consent be obtained? We then propose human-centered solutions to help navigate standards of ethical and legal consent across the EU, purposefully addressing those use cases to compensate for human difficulties in managing consent without clear guidelines. These solutions – including ISO standards, ontologies, consent mechanisms, value-centered privacy assistants, and layered dynamic consent platforms – complement and aid humans to help uphold ethical and rigorous consent.
Knowledge is essential for organizations’ growth as it allows them to solve problems, make decisions, innovate, and stay competitive. Within organizations, there is, on the one hand, explicit knowledge that is easy to capture, represent, and share. On the other hand, there is tacit knowledge possessed and acquired by individuals during their activities. Unlike explicit knowledge, tacit knowledge is difficult to capture and formalize. Organizations have granted more interest and efforts in representing, sharing, and reasoning from explicit knowledge. However, for tacit personal knowledge, they rely on methods such as meetings, mentoring, questions answering, or interviews which are limited in capitalizing on personal knowledge.
This study elaborates on the construction of interpersonal activity graphs for representing, sharing, and reasoning on organizations’ tacit knowledge possessed by individuals. The established graph is based on an extended activity theory framework and an ontology for common semantics. The proposed representation captures tacit knowledge in a graph form, making it shareable while offering means to reason and query it.
In the healthcare sector in particular, the shortage of skilled workers is a major problem that will become even more acute in the future as a result of demographic change. One way to counteract this trend is to use intelligent systems to reduce the workload of healthcare professionals. AI-based clinical decision support systems (AICDSS) have already proven their worth in this area, while simultaneously improving medical care. More recently, AICDSS have also been characterized by their ability to leverage the increasing availability of clinical data to assist healthcare professionals and patients in a variety of situations based on structured and unstructured data. However, the need to access large amounts of data while adhering to strict privacy regulations and the dependence on user adoption have highlighted the need to further adapt the implementation of AICDSS to integrate with existing healthcare routines. A subproject of the ViKI pro research project investigates how AICDSS can be successfully integrated into professional care planning practice using a user-centered design thinking approach. This paper presents the design of the ViKI pro AICDSS and the challenges related to privacy, user acceptance, and the data base. It also describes the development of an AI-based cloud technology for data processing and exchange using federated learning, and the development of an explicable AI algorithm for recommending care interventions. The core of the AICDSS is a human-in-the-loop system for data validation, in which the output of the AI model is continuously verified by skilled personnel to ensure continuous improvement in accuracy and transparent interaction between AI and humans.
Knowledge graphs are important in human-centered AI because of their ability to reduce the need for large labelled machine-learning datasets, facilitate transfer learning, and generate explanations. However, knowledge-graph construction has evolved into a complex, semi-automatic process that increasingly relies on opaque deep-learning models and vast collections of heterogeneous data sources to scale. The knowledge-graph lifecycle is not transparent, accountability is limited, and there are no accounts of, or indeed methods to determine, how fair a knowledge graph is in the downstream applications that use it. Knowledge graphs are thus at odds with AI regulation, for instance the EU’s upcoming AI Act, and with ongoing efforts elsewhere in AI to audit and debias data and algorithms. This paper reports on work in progress towards designing explainable (XAI) knowledge-graph construction pipelines with human-in-the-loop and discusses research topics in this space. These were grounded in a systematic literature review, in which we studied tasks in knowledge-graph construction that are often automated, as well as common methods to explain how they work and their outcomes. We identified three directions for future research: (i) tasks in knowledge-graph construction where manual input remains essential and where there may be opportunities for AI assistance; (ii) integrating XAI methods into established knowledge-engineering practices to improve stakeholder experience; as well as (iii) evaluating how effective explanations genuinely are in making knowledge-graph construction more trustworthy.
The aim of the paper is to discuss the motivation and the methodology used to construct a survey that aims to gather data on the moral preferences of users in an ever-growing digital world, in order to implement an exoskeleton software (i.e. EXOSOUL) that will be able to protect and support the users in such a world. Even if we are more interested in presenting and discussing the methodology adopted, in Section 5 we present the preliminary results of the survey.
In our society there is a growing and constant interaction between human agents and artificial agents, such as algorithms, robots, platforms, and ICT systems in general. The spread of these technologies poses new ethical challenges beyond the existing ones. This is for two main reasons. First, the amount of interactions between human agents and artificial ones involves a number of ethical aspects that is overwhelming. Secondly, and most importantly, the progressive self-sufficiency and autonomy that increasingly sophisticated systems are acquiring seem to deprive human beings of one of their most defining ethical aspects: the impact of systems’ autonomy with respect to human decisions and actions. In line with this perspective, the EXOSOUL multidisciplinary project has the goal of creating a software exoskeleton that helps users to interact with artificial agents according to their ethical preferences. In this work, we aim to investigate how to collect human agent’s ethical preferences. In Section 1 we present the EXOSOUL projects and in Section 2 the motivation for this paper. Section 3 and 4 illustrate the new approach, while in Section 5 we provide the preliminary results. Section 6 concludes and presents the work to be done in the future.
In Section 1 we present the EXOSOUL project and in Section 2 the motivation for this paper. Section 3 and 4 illustrate the new approach, while in Section 5 we provide the preliminary results. Section 6 concludes and presents the work to be done in the future.
We present a prototype hybrid prediction market and demonstrate the avenue it represents for meaningful human-AI collaboration. We build on prior work proposing artificial prediction markets as a novel machine learning algorithm. In an artificial prediction market, trained AI agents (bot traders) buy and sell outcomes of future events. Classification decisions can be framed as outcomes of future events, and accordingly, the price of an asset corresponding to a given classification outcome can be taken as a proxy for the systems confidence in that decision. By embedding human participants in these markets alongside bot traders, we can bring together insights from both. In this paper, we detail pilot studies with prototype hybrid markets for the prediction of replication study outcomes. We highlight challenges and opportunities, share insights from semi-structured interviews with hybrid market participants, and outline a vision for ongoing and future work.