Ebook: Caring is Sharing – Exploiting the Value in Data for Health and Innovation
Modern information and communication technologies make it easier for individuals to be involved in their own health and social care. They also facilitate contact between individuals and service providers and deliver more efficient tools for healthcare staff. Artificial Intelligence (AI) promises to bring even more benefits in the future, with more effectiveness and the provision of decision support.
This book presents the proceedings of the 33rd Medical Informatics Europe Conference, MIE2023, held in Gothenburg, Sweden, from 22 to 25 May 2023. The theme of MIE2023 was ‘Caring is Sharing – Exploiting Value in Data for Health and Innovation’, stressing the increasing importance of sharing digital-health data and the related challenges. The sharing of health data is developing rapidly, both in Europe and beyond, so the focus of the conference was on the enabling of trustworthy sharing of data to improve health. Topics covered include healthcare, community care, self-care, public health, and the innovation and development of future-proof digital-health solutions, and the almost 300 papers divided into 10 chapters also cover important advances in the sub domains of biomedical informatics: decision support systems, clinical information systems, clinical research informatics, knowledge management and representation, consumer health informatics, natural language processing, public health informatics, privacy, ethical and societal aspects among them.
Describing innovative approaches to the collection, organization, analysis, and data-sharing related to health and wellbeing, the book contributes to the expertise required to take medical informatics to the next level, and will be of interest to all those working in the field.
The 33rd Medical Informatics Europe Conference, MIE2023, was held in Gothenburg, Sweden, from 22 to 25 May 2023. The Conference was hosted by the European Federation for Medical Informatics (EFMI) and organized by the Swedish Medical Informatics Association (SFMI). The Scientific Programme Committee was chaired by Associate Professor Maria Hägglund, Uppsala University and Uppsala University Hospital.
The overarching theme of MIE2023 was “Caring is Sharing – Exploiting Value in Data for Health and Innovation”, stressing the increasing importance of sharing digital-health data and the challenges related to this. The theme is closely connected to the rapid development of health-data sharing in Europe and globally, so the focus was on the opportunities provided by health informatics and research to enable the trustworthy sharing of health data to improve human health. This includes healthcare, community care, self-care, public health, and the innovation and development of future-proof digital health solutions. Modern information and communication technologies make it easier for individuals to be involved in their own health and social care, facilitate contact between individuals and service providers, and provide more efficient tools for healthcare staff. Furthermore, artificial intelligence (AI) promises to be of benefit in the future, bringing more effectiveness in some situations and providing decision support.
The COVID-19 pandemic has not only increased the speed of implementation and adoption of eHealth throughout Europe and globally, but has also highlighted how weak infrastructure and obstructive data-sharing regulations can hinder effective public health interventions and innovation. The European Union has highlighted this in the European Health Data Space (EHDS) legislation proposal. The EHDS will require all EU member states to step up their digitalization of healthcare, for both the primary and secondary use of health data. These proceedings contribute to this work, providing the expertise and cutting-edge research required to implement the EHDS proposal.
Throughout this publication, readers will find innovative approaches to the collection, organization, analysis, and especially the sharing of data and knowledge related to health and wellbeing. Included papers also cover important advances in the sub domains of biomedical informatics; decision support systems, clinical information systems, clinical research informatics, knowledge management, and representation, consumer health informatics, natural language processing, public health informatics, privacy, ethical and societal aspects, etc.
The Proceedings are published as an e-book, with open access for ease of use and browsing without any loss of the advantages of indexing and citation, in the biggest Scientific Literature Databases, such as Medline and Scopus provided by the series of Studies in Health Technology and Informatics (HTI) of IOS Press.
The Editors,
Maria Hägglund, Madeleine Blusi, Stefano Bonacina, Lina Nilsson, Inge Cort Madsen, Anne Moen, Lars Lindsköld, Arriel Benis, Parisis Gallos
Uppsala, 02.04.2023
Research on real-world data is becoming increasingly important. The current restriction to clinical data in Germany limits the view of the patient. To gain comprehensive insights, claims data can be added to the existing knowledge. However, standardized transfer of German claims data into OMOP CDM is currently not possible. In this paper, we conducted an evaluation regarding the coverage of source vocabularies and data elements of German claims data in OMOP CDM. We point out the need to extend vocabularies and mappings to support research on German claims data.
New technologies such as devices, apps, smartphones, and sensors not only enable people to self-monitor their health but also share their health data with healthcare professionals. Data collection and dissemination occur across a wide variety of environments and settings, tracking everything from biometric data to mood and behavior, which has been termed Patient Contributed Data (PCD). In this work, we created a patient journey, enabled by PCD, to shape a connected health model for Cardiac Rehabilitation (CR) in Austria. Consequently, we highlighted the potential PCD benefit, which is a postulated increasing uptake of CR and improved patient outcomes through apps in a home-based setting. Finally, we addressed the related challenges and policy barriers that hinder the implementation of CR-connected health in Austria and identified actions to be taken.
Standardized order sets are a pragmatic type of clinical decision support that can improve adherence to clinical guidelines with a list of recommended orders related to a specific clinical context. We developed a structure facilitating the creation of order sets and making them interoperable, to increase their usability. Various orders contained in electronic medical records in different hospitals were identified and included in different categories of orderable items. Clear definitions were provided for each category. A mapping to FHIR resources was performed to relate these clinically meaningful categories to FHIR standards to assure interoperability. We used this structure to implement the relevant user interface in the Clinical Knowledge Platform. The use of standard medical terminologies and the integration of clinical information models like FHIR resources are key factors for creating reusable decision support systems. The content authors should be provided with a clinically meaningful system to use in a non-ambiguous context.
Process mining is a relatively new method that connects data science and process modelling. In the past years a series of applications with health care production data have been presented in process discovery, conformance check and system enhancement. In this paper we apply process mining on clinical oncological data with the purpose of studying survival outcomes and chemotherapy treatment decision in a real-world cohort of small cell lung cancer patients treated at Karolinska University Hospital (Stockholm, Sweden). The results highlighted the potential role of process mining in oncology to study prognosis and survival outcomes with longitudinal models directly extracted from clinical data derived from healthcare.
Adherence to recombinant human growth hormone (r-hGH; somatropin, [Saizen®], Merck Healthcare KGaA, Darmstadt, Germany) treatment is fundamental to achieve positive growth outcomes in children with growth disorders and to improve quality of life and cardiometabolic risk in adult patients affected by GH deficiency. Pen injector devices are commonly used to deliver r-hGH but, to the authors’ knowledge, none is currently digitally connected. Since digital health solutions are rapidly becoming valuable tools to support patients to adhere to treatment, the combination of a pen injector connected to a digital ecosystem to monitor treatment adherence is an important advance. Here, we present the methodology and first results of a participatory workshop that assessed clinicians’ perceptions on such a digital solution – the aluetta™ smartdot™ (Merck Healthcare KGaA, Darmstadt, Germany) – combining the aluetta™ pen injector and a connected device, components of a comprehensive digital health ecosystem to support pediatric patients receiving r-hGH treatment. The aim being to highlight the importance of collecting clinically meaningful and accurate real-world adherence data to support data-driven healthcare.
Data sharing provides benefits in terms of transparency and innovation. Privacy concerns in this context can be addressed by anonymization techniques. In our study, we evaluated anonymization approaches which transform structured data in a real-world scenario of a chronic kidney disease cohort study and checked for replicability of research results via 95% CI overlap in two differently anonymized datasets with different protection degrees. Calculated 95% CI overlapped in both applied anonymization approaches and visual comparison presented similar results. Thus, in our use case scenario, research results were not relevantly impacted by anonymization, which adds to the growing evidence of utility-preserving anonymization techniques.
Even though the interest in machine learning studies is growing significantly, especially in medicine, the imbalance between study results and clinical relevance is more pronounced than ever. The reasons for this include data quality and interoperability issues. Hence, we aimed at examining site- and study-specific differences in publicly available standard electrocardiogram (ECG) datasets, which in theory should be interoperable by consistent 12-lead definition, sampling rate, and measurement duration. The focus lies upon the question of whether even slight study peculiarities can affect the stability of trained machine learning models. To this end, the performances of modern network architectures as well as unsupervised pattern detection algorithms are investigated across different datasets. Overall, this is intended to examine the generalization of machine learning results of single-site ECG studies.
Type 2 diabetes is a life-long health condition, and as it progresses, A range of comorbidities can develop. The prevalence of diabetes has increased gradually, and it is expected that 642 million adults will be living with diabetes by 2040. Early and proper interventions for managing diabetes-related comorbidities are important. In this study, we propose a Machine Learning (ML) model for predicting the risk of developing hypertension for patients who already have Type 2 diabetes. We used the Connected Bradford dataset, consisting of 1.4 million patients, as our main dataset for data analysis and model building. As a result of data analysis, we found that hypertension is the most frequent observation among patients having Type 2 diabetes. Since hypertension is very important to predict clinically poor outcomes such as risk of heart, brain, kidney, and other diseases, it is crucial to make early and accurate predictions of the risk of having hypertension for Type 2 diabetic patients. We used Naïve Bayes (NB), Neural Network (NN), Random Forest (RF), and Support Vector Machine (SVM) to train our model. Then we ensembled these models to see the potential performance improvement. The ensemble method gave the best classification performance values of accuracy and kappa values of 0.9525 and 0.2183, respectively. We concluded that predicting the risk of developing hypertension for Type 2 diabetic patients using ML provides a promising stepping stone for preventing the Type 2 diabetes progression.
FHIR is a widely accepted interoperability standard for exchanging medical data, but data transformation from the primary health information systems into FHIR is usually challenging and requires advanced technical skills and infrastructure. There is a critical need for low-cost solutions, and using Mirth Connect as an open-source tool provides this opportunity. We developed a reference implementation to transform data from CSV (the most common data format) into FHIR resources using Mirth Connect without any advanced technical resources or programming skills. This reference implementation is tested successfully for both quality and performance, and it enables reproducing and improving the implemented approach by healthcare providers to transform raw data into FHIR resources. For ensuring replicability, the used channel, mapping, and templates are available publicly on GitHub (https://github.com/alkarkoukly/CSV-FHIR-Transformer).
The European Health Data Space (EHDS) proposal aims to establish a set of rules and governance frameworks to promote the use of electronic health data for both primary and secondary purposes. This study aims at analysing the implementation status of the EHDS proposal in Portugal, particularly the points concerning the primary use of health data. The proposal was scanned for the points that gave member states a direct responsibility to implement actions, and a literature review and interviews were conducted to assess the implementation status of these policies in Portugal This study found that Portugal is well advanced in the implementation of policies concerning the rights of natural persons in relation to the primary use of their personal health data, but also identified challenges, which include the lack of a common interoperability framework for the exchange of electronic health data.
With the recent advancement in the field of machine learning, health synthetic data has become a promising technique to address difficulties with time consumption when accessing and using electronic medical records for research and innovations. However, health synthetic data utility and governance have not been extensively studied. A scoping review was conducted to understand the status of evaluations and governance of health synthetic data following the PRISMA guidelines. The results showed that if synthetic health data are generated via proper methods, the risk of privacy leaks has been low and data quality is comparative to real data. However, the generation of health synthetic data has been generated on a case-by-case basis instead of being scaled up. Furthermore, regulations, ethics, and data sharing of health synthetic data have primarily been inexplicit, although common principles for sharing such data do exist.
Reproducibility imposes some special requirements at different stages of each project, including reproducible workflows for the analysis including to follow best practices regarding code style and to make the creation of the manuscript reproducible as well. Available tools therefore include version control systems such as Git and document creation tools such as Quarto or R Markdown. However, a re-usable project template mapping the entire process from performing the data analysis to finally writing the manuscript in a reproducible manner is yet lacking. This work aims to fill this gap by presenting an open source template for conducting reproducible research projects utilizing a containerized framework for both developing and conducting the analysis and summarizing the results in a manuscript. This template can be used instantly without any customization.
The interest in the application of AI in medicine has intensely increased over the past decade with most of the changes in the past five years. Most recently, the application of deep learning algorithms in prediction and classification of cardiovascular diseases (CVD) using computed tomography (CT) images showed promising results. The notable and exciting advancement in this area of study is, however, associated with different challenges related to the findability (F), accessibility(A), interoperability(I), reusability(R) of both data and source code. The aim of this work is to identify reoccurring missing FAIR-related features and to assess the level of FAIRness of data and models used to predict/diagnose cardiovascular diseases from CT images. We evaluated the FAIRness of data and models in published studies using the RDA (Research Data Alliance) FAIR Data maturity model and FAIRshake toolkit. The finding showed that although AI is anticipated to bring ground breaking solutions for complex medical problems, the findability, accessibility, interoperability and reusability of data/metadata/code is still a prominent challenge.
Availability and accessibility are important preconditions for using real-world patient data across organizations. To facilitate and enable the analysis of data collected at a large number of independent healthcare providers, syntactic- and semantic uniformity need to be achieved and verified. With this paper, we present a data transfer process implemented using the Data Sharing Framework to ensure only valid and pseudonymized data is transferred to a central research repository and feedback on success or failure is provided. Our implementation is used within the CODEX project of the German Network University Medicine to validate COVID-19 datasets at patient enrolling organizations and securely transfer them as FHIR resources to a central repository.
Electrodermal activity (EDA) reflects sympathetic nervous system activity through sweating-related changes in skin conductance. Decomposition analysis is used to deconvolve the EDA into slow and fast varying tonic and phasic activity, respectively. In this study, we used machine learning models to compare the performance of two EDA decomposition algorithms to detect emotions such as amusing, boring, relaxing, and scary. The EDA data considered in this study were obtained from the publicly available Continuously Annotated Signals of Emotion (CASE) dataset. Initially, we pre-processed and deconvolved the EDA data into tonic and phasic components using decomposition methods such as cvxEDA and BayesianEDA. Further, 12 time-domain features were extracted from the phasic component of EDA data. Finally, we applied machine learning algorithms such as logistic regression (LR) and support vector machine (SVM), to evaluate the performance of the decomposition method. Our results imply that the BayesianEDA decomposition method outperforms the cvxEDA. The mean of the first derivative feature discriminated all the considered emotional pairs with high statistical significance (p<0.05). SVM was able to detect emotions better than the LR classifier. We achieved a 10-fold average classification accuracy, sensitivity, specificity, precision, and f1-score of 88.2%, 76.25%, 92.08%, 76.16%, and 76.15% respectively, using BayesianEDA and SVM classifiers. The proposed framework can be utilized to detect emotional states for the early diagnosis of psychological conditions.
The aim of this study was to map Korean national health insurance claims codes for laboratory tests to SNOMED CT. The mapping source codes were 4,111 claims codes for laboratory test and mapping target codes were the International Edition of SNOMED CT released on July 31, 2020. We used rule-based automated and manual mapping methods. The mapping results were validated by two experts. Out of 4,111 codes, 90.5% were mapped to the concepts of procedure hierarchy in SNOMED CT. Of them, 51.4% of the codes were exactly mapped to SNOMED CT concepts, and 34.8% of the codes were mapped to SNOMED CT concepts as one-to-one mapping.
Each epidemic and pandemic is accompanied by an infodemic. The infodemic during the COVID-19 pandemic was unprecedented. Accessing accurate information was difficult and misinformation harmed the pandemic response, the health of individuals and trust in science, governments and societies. WHO is building a community-centered information platform, the Hive, to deliver on the vision of ensuring that all people everywhere have access to the right information, at the right time, in the right format in order to make decisions to protect their health and the health of others. The platform provides access to credible information, a safe space for knowledge-sharing, discussion, and collaborating with others, and a forum to crowdsource solutions to problems. The platform is equipped with many collaboration features, including instant chats, event management, and data analytics tools to generate insights. The Hive platform is an innovative minimum viable product (MVP) that seeks to leverage the complex information ecosystem and the invaluable role communities play to share and access trustworthy health information during epidemics and pandemics.
Laboratory data must be interoperable to be able to accurately compare the results of a lab test between healthcare organizations. To achieve this, terminologies like LOINC (Logical Observation Identifiers, Names and Codes) provide unique identification codes for laboratory tests. Once standardized, the numeric results of laboratory tests can be aggregated and represented in histograms. Due to the characteristics of Real World Data (RWD), outliers and abnormal values are common, but these cases should be treated as exceptions, excluding them from possible analysis. The proposed work analyses two methods capable of automating the selection of histogram limits to sanitize the generated lab test result distributions, Tukey’s box-plot method and a “Distance to Density” approach, within the TriNetX Real World Data Network. The generated limits using clinical RWD are generally wider for Tukey’s method and narrower for the second method, both greatly dependent on the values used for the algorithm’s parameters.
The COVID-19 pandemic has urged the need to set up, conduct and analyze high-quality epidemiological studies within a very short time-scale to provide timely evidence on influential factors on the pandemic, e.g. COVID-19 severity and disease course. The comprehensive research infrastructure developed to run the German National Pandemic Cohort Network within the Network University Medicine is now maintained within a generic clinical epidemiology and study platform NUKLEUS. It is operated and subsequently extended to allow efficient joint planning, execution and evaluation of clinical and clinical-epidemiological studies. We aim to provide high-quality biomedical data and biospecimens and make its results widely available to the scientific community by implementing findability, accessibility, interoperability and reusability – i.e. following the FAIR guiding principles. Thus, NUKLEUS might serve as role model for FAIR and fast implementation of clinical epidemiological studies within the setting of University Medical Centers and beyond.
Accessibility to high-quality historical data for patients in hospitals may facilitate related predictive model development and data analysis experiments. This study provides a design for a data-sharing platform based on all possible criteria for Medical Information Mart for Intensive Care (MIMIC) IV and Emergency MIMIC-ED. Tables containing columns of medical attributions and outcomes were studied by a team of 5 experts in Medical Informatics. They completely agreed about the columns connection using subject-id, HDM-id, and stay-id as foreign keys. The tables of two marts were considered in the intra-hospital patient transfer path with various outcomes. Using the constraints, queries were generated and applied to the backend of the platform. The suggested user interface was drawn to retrieve records based on various entry criteria and present the output in the frame of a dashboard or a graph. This design is a step toward platform development that is useful for studies aimed at patient trajectory analysis, medical outcome prediction, or studies that require heterogeneous data entries.
Endometriosis is a complex, poorly understood, female health condition that can markedly reduce a woman’s quality of life. The gold-standard diagnostic method for Endometriosis is invasive laparoscopic surgery, which is costly, not timely, and comes with risks to the patient. We argue that the need for a non-invasive diagnosis procedure, higher quality of patient care and reduced diagnosis delay, can be fulfilled by advances and research to devise innovative computational solutions. To leverage computational and algorithmic techniques, enhanced data recording and sharing are vital. We discuss the potential benefits of using personalised computational healthcare on both the clinician and patient side, reducing the lengthy average diagnosis time (currently around 8 years).
Semantic interoperability, i.e., the ability to automatically interpret the shared information in a meaningful way, is one of the most important requirements for data analysis of different sources. In the area of clinical and epidemiological studies, the target of the National Research Data Infrastructure for Personal Health Data (NFDI4Health), interoperability of data collection instruments such as case report forms (CRFs), data dictionaries and questionnaires is critical. Retrospective integration of semantic codes into study metadata at item-level is important, as ongoing or completed studies contain valuable information, which should be preserved. We present a first version of a Metadata Annotation Workbench to support annotators in dealing with a variety of complex terminologies and ontologies. User-driven development with users from the fields of nutritional epidemiology and chronic diseases ensured that the service fulfills the basic requirements for a semantic metadata annotation software for these NFDI4Health use cases. The web application can be accessed using a web browser and the source code of the software is available with an open-source MIT license.
Management of multimorbidity in patients with mild dementia and mild cognitive impairment introduces additional challenges. The CAREPATH project provides an integrated care platform to assist both healthcare professionals and patients and their informal caregivers in the day-to-day management of care plans for this patient population. This paper introduces an HL7 FHIR-based interoperability approach for exchanging care plan action and goals with the patients and collecting feedback and adherence information from patients. In this way, seamless information exchange between healthcare professionals, patients and their informal care givers is achieved to support patients in their self-care management journey and increase their adherence to their care plans despite the burdens of mild dementia.