Ebook: Challenges of Trustable AI and Added-Value on Health
Artificial Intelligence (AI) in healthcare promises to improve the accuracy of diagnosis and screening, support clinical care, and assist in various public health interventions such as disease surveillance, outbreak response, and health system management. But the increasing importance of AI in healthcare means that trustworthy AI is vital to achieve the beneficial impacts on health anticipated by both health professionals and patients.
This book presents the proceedings of the 32nd Medical Informatics Europe Conference (MIE2022), organized by the European Federation for Medical Informatics (EFMI) and held from 27 - 30 May 2022 in Nice, France. The theme of the conference was Challenges of Trustable AI and Added-Value on Health. Over 400 submissions were received from 43 countries, and were reviewed in a thorough process by at least three reviewers before being assessed by an SPC co-chair, with papers requiring major revision undergoing further review. Included here are 147 full papers (acceptance rate 54%), 23 short papers and 79 posters from the conference. Topics covered include the usual sub-domains of biomedical informatics: decision support and clinical information systems; clinical research informatics; knowledge management and representation; consumer health informatics; natural language processing; public health informatics; and privacy, ethical and societal aspects, but also innovative approaches to the collection, such as organization and analysis of data and knowledge related to health and wellbeing, as well as theoretical and applied contributions to AI methods and algorithms.
Providing an overview of the latest developments in medical informatics, the book will be of interest to all those involved in the development and provision of healthcare today.
The 32nd Medical Informatics Europe Conference, MIE2022, was held in Nice, France, from 27 to 30 May 2022. The Conference was hosted by the European Federation for Medical Informatics (EFMI) and organized by the “MCO Congress”. The Scientific Programme Committee was chaired by Professor Brigitte Séroussi, Sorbonne University, French Association of Medical Informatics (AIM). The overarching theme of MIE2022 was “Challenges of Trustable AI and Added-value on Health” stressing the increasing importance of Artificial Intelligence (AI) in healthcare on the one hand and the need for a trustworthy AI on the other hand, in order to reach the full expected impact of AI on health from both health-professional and patient-centered perspectives.
Developed in the 1970s, the first AI systems were essentially knowledge-based decision support systems. Despite their good performance, these first knowledge-based systems were never routinely used on real patients, and because they were able to explain their reasoning process, most of them turned out to be more beneficial for teaching than for clinical practice. After some “winters”, AI is back, with new machine learning methods that promise to improve the accuracy of diagnosis and screening, support clinical care, and assist various public health interventions such as disease surveillance, outbreak response, and health system management. Naturally, as these new AI systems emerge, concerns arise concerning the level of control that should be conceded when reviewing the pace at which AI methods are introduced. One concern in particular arises from the fact that these new AI systems often work as “black boxes”, and are unable to explain their results. Explainability is critical, however, to respond to patient and practitioner narrative exchanges, and the fact that practitioners, who are responsible for their decisions, cannot easily follow the proposals of AI systems that they disagree with is also a problem.
Throughout this publication, readers will find innovative approaches to the collection, organization, and analysis of data and knowledge related to health and wellbeing, as well as theoretical and applied contributions to AI methods and algorithms. Papers covering the usual subdomains of biomedical informatics (decision support systems, clinical information systems, clinical research informatics, knowledge management and representation, consumer health informatics, natural language processing, bioinformatics, public health informatics, privacy, ethical and societal aspects, etc.) are also offered. The Proceedings are published as an e-book by IOS Press in the series Studies in Health Technology and Informatics (HTI), providing open access for ease of use and browsing without the loss of any of the advantages of indexing and citation, in the major Scientific Literature Databases, such as Medline and Scopus.
Brigitte Séroussi, Patrick Weber, Ferdinand Dhombres, Cyril Grouin, Jan-David Liebe, Sylvia Pelayo, Andrea Pinna, Bastien Rance, Lucia Sacchi, Adrien Ugon, Arriel Benis, Parisis Gallos
Chronic exposure to environmental arsenic has been linked to a number of human diseases affecting multiple organ systems, including cancer. The greatest concern for chronic exposure to arsenic is contaminated groundwater used for drinking as it is the main contributor to the amount of arsenic present in the body. An estimated 40% of households in Nova Scotia (Canada) use water from private wells, and there is a concern that exposure to arsenic may be linked to/associated with cancer. In this preliminary study, we are aiming to gain insights into the association of environmental metal’s pathogenicity and carcinogenicity with prostate cancer. We use toenails as a novel biomarker for capturing long-term exposure to arsenic, and have performed toxicological analysis to generate data about differential profiles of arsenic species and the metallome (entirety of metals) for both healthy and individuals with a history cancer. We have applied feature selection and machine learning algorithms to arsenic species and metallomics profiles of toenails to investigate the complex association between environmental arsenic (as a carcinogen) and prostate cancer. We present machine learning based models to ultimately predict the association of environmental arsenic exposure in cancer cases.
The acceptance of artificial intelligence (AI) systems by health professionals is crucial to obtain a positive impact on the diagnosis pathway. We evaluated user satisfaction with an AI system for the automated detection of findings in chest x-rays, after five months of use at the Emergency Department. We collected quantitative and qualitative data to analyze the main aspects of user satisfaction, following the Technology Acceptance Model. We selected the intended users of the system as study participants: radiology residents and emergency physicians. We found that both groups of users shared a high satisfaction with the system’s ease of use, while their perception of output quality (i.e., diagnostic performance) differed notably. The perceived usefulness of the application yielded positive evaluations, focusing on its utility to confirm that no findings were omitted, and also presenting distinct patterns across the two groups of users. Our results highlight the importance of clearly differentiating the intended users of AI applications in clinical workflows, to enable the design of specific modifications that better suit their particular needs. This study confirmed that measuring user acceptance and recognizing the perception that professionals have of the AI system after daily use can provide important insights for future implementations.
Artificial intelligence (AI) for radiology has the potential to handle an ever-increasing volume of imaging examinations. However, the implementation of AI for clinical practice has not lived up to expectations. We suggest that a key problem with AI projects in radiology is that high expectations associated with new and unproven AI technology tend to scale the projects in ways that challenge their anchoring in local practice and their initial purpose of serving local needs. Empirically, we focus on the procurement of an AI solution for radiology practice at a large health trust in Norway where it was intended that AI technology would be used to process the screening of images more effectively. Theoretically, we draw on the information infrastructure literature, which is concerned with scaling innovative technologies from local settings, with a limited number of users, to broad-use contexts with many users.
In this paper, we present an approach to improve the accuracy and reliability of ECG classification. The proposed method combines features analysis of linear and non-linear ECG dynamics. Non-linear features are represented by complexity measures of assessment of ordinal network non-stationarity. We describe the basic concept of ECG partitioning and provide an experiment on PQRST complex data. The results demonstrate that the proposed technique effectively detects abnormalities via automatic feature extraction and improves the state-of-the-art detection performance on one of the standard collections of heartbeat signals, the ECG5000 dataset.
Synthetic data has been more and more used in the last few years. While its applications are various, measuring its utility and privacy is seldom an easy task. Since there are different methods of evaluating these issues, which are dependent on data types, use cases and purpose, a generic method for evaluating utility and privacy does not exist at the moment. So, we introduced a compilation of the most recent methods for evaluating privacy and utility into a single executable in order to create a report of the similarities and potential privacy breaches between two datasets, whether it is related to synthetic or not. We catalogued 24 different methods, from qualitative to quantitative, column-wise or table-wise evaluations. We hope this resource can help scientists and industries get a better grasp of the synthetic data they have and produce more easily and a better basis to create a new, more broad method for evaluating dataset similarities.
Sharing observational and interventional health data within a common data space enables university hospitals to leverage such data for biomedical discovery and moving towards a learning health system.
To describe the AP-HP Health Data Space (AHDS) and the IT services supporting piloting, research, innovation and patient care.
Built on three pillars – governance and ethics, technology and valorization – the AHDS and its major component, the Clinical Data Warehouse (CDW) have been developed since 2015.
The AP-HP CDW has been made available at scale to AP-HP both healthcare professionals and public or private partners in January 2017. Supported by an institutional secured and high-performance cloud and an ecosystem of tools, mostly open source, the AHDS integrates a large amount of massive healthcare data collected during care and research activities. As of December 2021, the AHDS operates the electronic data capture for almost +840 clinical trials sponsored by AP-HP, the CDW is enabling the processing of health data from more than 11 million patients and generated +200 secondary data marts from IRB authorized research projects. During the Covid-19 pandemic, AHDS has had to evolve quickly to support administrative professionals and caregivers heavily involved in the reorganization of both patient care and biomedical research.
The AP-HP Data Space is a key facilitator for data-driven evidence generation and making the health system more efficient and personalized.
Correct performance assessment is crucial for evaluating modern artificial intelligence algorithms in medicine like deep-learning based medical image segmentation models. However, there is no universal metric library in Python for standardized and reproducible evaluation. Thus, we propose our open-source publicly available Python package MISeval: a metric library for Medical Image Segmentation Evaluation. The implemented metrics can be intuitively used and easily integrated into any performance assessment pipeline. The package utilizes modern DevOps strategies to ensure functionality and stability. MISeval is available from PyPI (miseval) and GitHub: unmapped: uri https://github.com/frankkramer-lab/miseval.
The frequency of potential drug-drug interactions (DDI) in published studies on real world data considerably varies due to the methodological framework. Contextualization of DDI has a proven effect in limiting false positives. In this paper, we experimented with the application of various DDIs contexts elements to see their impact on the frequency of potential DDIs measured on the same set of prescription data collected in EDSaN, the clinical data warehouse of Rouen University Hospital. Depending on the context applied, the frequency of daily prescriptions with potential DDI ranged from 0.89% to 3.90%. Substance-level analysis accounted for 48% of false positives because it did not account for some drug-related attributes. Consideration of the patient’s context could eliminate up to an additional 29% of false positives.
Automatic classification of ECG signals has been a longtime research area with large progress having been made recently. However these advances have been achieved with increasingly complex models at the expense of model’s interpretability. In this research, a new model based on multivariate autoregressive model (MAR) coefficients combined with a tree-based model to classify bundle branch blocks is proposed. The advantage of the presented approach is to build a lightweight model which combined with post-hoc interpretability can bring new insights into important cross-lead dependencies which are indicative of the diseases of interest.
Medical assistance to stroke patients must start as early as possible; however, several changes have impacted healthcare services during the Covid-19 pandemic. This research aimed to identify the stroke onset-to-door time during the Covid-19 pandemic considering the different paths a patient can take until receiving specialized care. It is a retrospective study based on process mining (PM) techniques applied to 221 electronic healthcare records of stroke patients during the pandemic. The results are two process models representing the patient’s path and performance, from the onset of the first symptoms to admission to specialized care. PM techniques have discovered the patient journey in providing fast stroke assistance.
Alterations to the brainstem can hamper cognitive functioning, including audiovisual and behavioral disintegration, leading to individuals with Autism Spectrum Disorder (ASD) face challenges in social interaction. In this study, a process pipeline for the diagnosis of ASD has been proposed, based on geometrical and Zernike moments features, extracted from the brainstem of ASD subjects. The subjects considered for this study are obtained from publicly available data base ABIDE (300 ASD and 300 typically developing (TD)). Distance regularized level set (DRLSE) method has been used to segment the brainstem region from the midsagittal view of MRI data. Similarity measures were used to validate the segmented images against the ground truth images. Geometrical and Zernike moments features were extracted from the segmented images. The significant features were used to train Support vector machine (SVM) classifier to perform classification between ASD and TD subjects. The similarity results show high matching between DRLSE segmented brainstem and ground truth with high similarity index scores of Pearson Heron-II (PH II) = 0.9740 and Sokal and Sneath-II (SS II) = 0.9727. The SVM classifier achieved 70.53% accuracy to classify ASD and TD subjects. Thus, the process pipeline proposed in this study is able to achieve good accuracy in the classification of ASD subjects.
Burnout in healthcare professionals (HCPs) is a multi-factorial problem. There are limited studies utilizing machine learning approaches to predict HCPs’ burnout during the COVID-19 pandemic. A survey consisting of demographic characteristics and work system factors was administered to 450 HCPs during the pandemic (participation rate: 59.3%). The highest performing machine learning model had an area under the receiver operating curve of 0.81. The eight key features that best predicted burnout are excessive workload, inadequate staffing, administrative burden, professional relationships, organizational culture, values and expectations, intrinsic motivation, and work-life integration. These findings provide evidence for resource allocation and implementation of interventions to reduce HCPs’ burnout and improve the quality of care.
Venous leg ulcers and diabetic foot ulcers are the most common chronic wounds. Their prevalence has been increasing significantly over the last years, consuming scarce care resources. This study aimed to explore the performance of detection and classification algorithms for these types of wounds in images. To this end, algorithms of the YoloV5 family of pre-trained models were applied to 885 images containing at least one of the two wound types. The YoloV5m6 model provided the highest precision (0.942) and a high recall value (0.837). Its mAP_0.5:0.95 was 0.642. While the latter value is comparable to the ones reported in the literature, precision and recall were considerably higher. In conclusion, our results on good wound detection and classification may reveal a path towards (semi-) automated entry of wound information in patient records. To strengthen the trust of clinicians, we are currently incorporating a dashboard where clinicians can check the validity of the predictions against their expertise.
Artificial intelligence (AI) in medicine is a very topical issue. As far as the attitudes and perspectives of the different stakeholders in healthcare are concerned, there is still much to be explored.
Our aim was to determine attitudes and aspects towards acceptance of AI applications from the perspective of physicians in university hospitals.
We conducted individual exploratory expert interviews. Low fidelity mockups were used to show interviewees potential application areas of AI in clinical care.
In principle, physicians are open to the use of AI in medical care. However, they are critical of some aspects such as data protection or the lack of explainability of the systems.
Although some trends in attitudes e.g., on the challenges or benefits of using AI became clear, it is necessary to conduct further research as intended by the subsequent PEAK project.
The electrification of the transportation sector is seen as a main pathway to reduce CO2 emissions and mitigate the earth’s climate change. Currently, Electric Vehicles (EVs) are entering the market fast. Although EVs have not been used as ambulances yet, the transition to the new type of vehicle is a matter of time. Thus, in this paper we discuss a number of research questions related to the efficient deployment of electric ambulances, focusing on the Artificial Intelligence (AI) point of view and we propose a framework for developing online algorithms that schedule the charging of electric ambulances and their assignment to patients.
In many countries, the management of cancer patients must be discussed in multidisciplinary tumor boards (MTBs). These meetings have been introduced to provide a collaborative and multidisciplinary approach to cancer care. However, the benefits of MTBs are now being challenged because there are a lot of cases and not enough time to discuss all the of them. During the evaluation of the guideline-based clinical decision support system (CDSS) of the DESIREE project, we found that for some clinical cases, the system did not produce recommendations. We assumed that these cases were complex clinical cases and needed deeper MTB discussions. In this work, we trained and tested several machine learning and deep learning algorithms on a labelled sample of 298 breast cancer patient summaries, to predict the complexity of a breast cancer clinical case. XGboost and multi-layer perceptron were the models with the best result, with an F1 score of 83%.
Adverse drug reaction is a major public health issue. The increasing availability of medico-administrative databases offers major opportunities to detect real-life pharmacovigilance signals. We have recently adapted a pharmacoepidemiological method to the large dimension, the WCE (Weigthed Cumulative Exposure) statistical model, which makes it possible to model the temporal relationship between the prescription of a drug and the appearance of a side effect without any a priori hypothesis. Unfortunately, this method faces a computational time problem. The objective of this paper is to describe the implementation of the WCE statistical model using Graphics Processing Unit (GPU) programming as a tool to obtain the spectrum of adverse drug reactions from medico-administrative databases. The process is divided into three steps: pre-processing of care pathways using the Python library Panda, calculation of temporal co-variables using the Python library “KeOps”, estimation of the model parameters using the Python library “PyTorch” – standard in deep learning. Programming the WCE method by distributing the heaviest portions (notably spline calculation) on the GPU makes it possible to accelerate the time required for this method by 1000 times using a computer graphics card and up to 10,000 times with a GPU server. This implementation makes it possible to use WCE on all the drugs on the market to study their spectrum of adverse effects, to highlight new vigilance signals and thus to have a global vigilance tool on medico-administrative database. This is a proof of concept for the use of this technology in epidemiology.
Emergency department is a key component of the health system where the management of crowding situations is crucial to the well-being of patients. This study proposes a new machine learning methodology and a queuing network model to measure and optimize crowding through a congestion indicator, which indicates a real-time level saturation.
Cancer recurrence is the diagnosis of a second clinical episode of cancer after the first was considered cured. Identifying patients who had experienced cancer recurrence is an important task as it can be used to compare treatment effectiveness, measure recurrence-free survival, and plan and prioritize cancer control resources. We developed BERT-based natural language processing (NLP) contextual models for identifying cancer recurrence incidence and the recurrence time based on the records in progress notes. Using two datasets containing breast and colorectal cancer patients, we demonstrated the advantage of the contextual models over the traditional NLP models by overcoming the laborious and often unscalable tasks of composing keywords in a specific disease domain.
Type 2 diabetes mellitus is a metabolic disorder of glucose management, whose prevalence is increasing inexorably worldwide. Adherence to therapies, along with a healthy lifestyle can help prevent the onset of disease. This preliminary study proposes the use of explainable artificial intelligence techniques with the aim of (i) characterizing diabetic patients through a set of easily interpretable rules and (ii) providing individualized recommendations for the prevention of the onset of the disease through the generation of counterfactual explanations, based on minimal variations of biomarkers routinely collected in primary care. The results of this preliminary study parallel findings from the literature as differences in biomarkers between patients with and without diabetes are observed for fasting blood sugar, body mass index, and high-density lipoprotein levels.
Parkinson’s disease (PD) is a common neurodegenerative disorder that severely impacts quality of life as the condition progresses. Early diagnosis and treatment is important to reduce burden and costs. Here, we evaluate the diagnostic potential of the Non-Motor symptoms (NMS) questionnaire by the International Parkinson and Movement Disorder Society based on patient-completed answers from a large single-center prospective study. In this study data from 489 study participants consisting of a PD group, a healthy control (HC) group and patients with differential diagnosis (DD) have been recorded with a smartphone-based system. Evaluation of the study data has shown a significant difference in NMS between the representative groups. Cross-validation of Machine Learning based classification achieves balanced accuracy scores of 88.7% in PD vs. HC, 72.1% in PD vs. DD and 82.6% when discriminating between all movement disorders (PD + DD) and the HC group. The results indicate potentially high feature importance of a simple self-administered questionnaire that could support early diagnosis.
Machine learning algorithms become increasingly prevalent in the field of medicine, as they offer the ability to recognize patterns in complex medical data. Especially in this sensitive area, the active usage of a mostly black box is a controversial topic. We aim to highlight how an aggregated and systematic feature analysis of such models can be beneficial in the medical context. For this reason, we introduce a grouped version of the permutation importance analysis for evaluating the influence of entire feature subsets in a machine learning model. In this way, expert-defined subgroups can be evaluated in the decision-making process. Based on these results, new hypotheses can be formulated and examined.
In 2022, the Medical Informatics Europe conference created a special topic called “Challenges of trustable AI and added-value on health” which was centered around the theme of eXplainable Artificial Intelligence. Unfortunately, two opposite views remain for biomedical applications of machine learning: accepting to use reliable but opaque models, vs. enforce models to be explainable. In this contribution we discuss these two opposite approaches and illustrate with examples the differences between them.