Ebook: Digital Personalized Health and Medicine
Digital health and medical informatics have grown in importance in recent years, and have now become central to the provision of effective healthcare around the world.
This book presents the proceedings of the 30th Medical Informatics Europe conference (MIE). This edition of the conference, hosted by the European Federation for Medical Informatics (EFMI) since the 1970s, was due to be held in Geneva, Switzerland in April 2020, but as a result of measures to prevent the spread of the Covid19 pandemic, the conference itself had to be cancelled.
Nevertheless, because this collection of papers offers a wealth of knowledge and experience across the full spectrum of digital health and medicine, it was decided to publish the submissions accepted in the review process and confirmed by the Scientific Program Committee for publication, and these are published here as planned. The 232 papers are themed under 6 section headings: biomedical data, tools and methods; supporting care delivery; health and prevention; precision medicine and public health; human factors and citizen centered digital health; and ethics, legal and societal aspects. A 7th section deals with the Swiss personalized health network, and section 8 includes the 125 posters accepted for the conference.
Offering an overview of current trends and developments in digital health and medical informatics, the book provides a valuable information resource for researchers and health practitioners alike.
This volume presents the proceedings of the 30th Medical Informatics Europe conference (MIE), organized in Geneva, Switzerland, in April 2020. This collection of papers offers a wealth of knowledge and experience across the full spectrum of digital health and medicine.
MIE conferences have been hosted by the European Federation for Medical Informatics (EFMI) since the 1970s. Over those decades, we have been privileged to share in a growing community of expertise in health and care informatics and seen the advancement of the field from a few pioneers to a diverse international network.
The overarching concept of Digital Personalized Health and Medicine is elaborated under the six programme themes as the structure of this volume:
-
Biomedical data, tools and methods
-
Supporting care delivery
-
Health and prevention
-
Precision medicine and public health
-
Human factors and citizen centered digital health
-
Ethics, legal and societal aspects
The topics of the conference proceedings demonstrate that crucial scientific work is still in progress across this scientific continuum and that much existing knowledge is yet to be widely adopted in routine practice. For example, recent years have seen unprecedented leaps forward in data science and machine learning, yet important work still continues on basic prerequisites such as data quality and computable semantic interoperability. Further excellent work continues in human-computer interaction, policy, workforce development, ethics and regulation. Increasing attention is being paid to citizen and patient concerns, whether it be trust in computerised clinical guidance, privacy and consent or issues of safety and accountability.
Helpfully, the Learning Health System is increasingly seen as a unifying concept for basic and translational health and care informatics, incorporating standards for representing biomedical knowledge, machine learning, precision medicine, clinical decision support systems, quality improvement and behaviour change.
We commend this body of papers to readers as a powerful educational resource.
MIE 2020 was organized in partnership with the World Health Organization (WHO), the International Telecommunication Union (ITU), the State of Geneva (Geneva), the University of Geneva (UNIGE) and the University Hospitals of Geneva (HUG). The local team would particularly like to acknowledge the work of Dr Vasiliki Foufi PhD and Christophe Gaudet-Blavignac BSc CS MMed. The editors would like to express thanks to our doctoral student assistants, Taiwo Adedeji MSc and Obinwa Ozonze MSc.
Louise B. Pape-Haugaard, Philip Scott
Editors
Christian Lovis, Inge Cort Madsen, Patrick Weber, Per Hostrup Nielsen
Scientific Programme Committee
Portsmouth/Aalborg, 10 March 2020
Research projects with humans is a highly regulated field that is currently undergoing rapid changes due to developments in eHealth and mHealth. While a patients data and samples must be thoroughly protected, they are also an invaluable source for fundamental and cutting edge research. There are processes in place to obtain a patient’s consent for the use of their data and samples for research. These approaches could be more flexible, user-friendly and modernised. There is a high demand among all parties for a unified, yet differentiated, dynamic and personalised eConsent. An Android app has been developed that brings any existing consent form to mobile devices, including the integration of the process into existing hospital IT using established data standards, such as FHIR and the ResearchStack open source framework.The app is user-tested and shown to work in a hospital setting. Lack of eIdentification and legal drawbacks were determined as the main obstacles for immediate implementation.
The cryptographic method Secure Multi-Party Computation (SMPC) could facilitate data sharing between health institutions by making it possible to perform analyses on a “virtual data pool”, providing an integrated view of data that is actually distributed – without any of the participants having to disclose their private data. One drawback of SMPC is that specific cryptographic protocols have to be developed for every type of analysis that is to be performed. Moreover, these protocols have to be optimized to provide acceptable execution times. As a first step towards a library of efficient implementations of common methods in health data sciences, we present a novel protocol for efficient time-to-event analysis. Our implementation utilizes a common technique called garbled circuits and was implemented using a widespread SMPC programming framework. We further describe optimizations that we have developed to reduce the execution times of our protocol. We experimentally evaluated our solution by computing Kaplan-Meier estimators over a vertically distributed dataset while measuring performance. By comparing the SMPC results with a conventional analysis on pooled data, we show that our approach is practical and scalable.
Healthcare 4.0 demands healthcare data to be shaped into a common standardized and interoperable format for achieving more efficient data exchange. Most of the techniques addressing this domain are dealing only with specific cases of data transformation through the translation of healthcare data into ontologies, which usually result in clinical misinterpretations. Currently, ontology alignment techniques are used to match different ontologies based on specific string and semantic similarity metrics, where very little systematic analysis has been performed on which semantic similarity techniques behave better. For that reason, in this paper we are investigating on finding the most efficient semantic similarity technique, based on an existing approach that can transform any healthcare dataset into HL7 FHIR, through the translation of the latter into ontologies, and their matching based on syntactic and semantic similarities.
The aim of this study was to develop a simple method to map the French International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10) with the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10 CM). We sought to map these terminologies forward (ICD-10 to ICD-10 CM) and backward (ICD-10 CM to ICD-10) and to assess the accuracy of these two mappings. We used several terminology resources such as the Unified Medical Language System (UMLS) Metathesaurus, Bioportal, the latest version available of the French ICD-10 and several official mapping files between different versions of the ICD-10. We first retrieved existing partial mapping between the ICD-10 and the ICD-10 CM. Then, we automatically matched the ICD-10 with the ICD-10-CM, using our different reference mapping files. Finally, we used manual review and natural language processing (NLP) to match labels between the two terminologies. We assessed the accuracy of both methods with a manual review of a random dataset from the results files. The overall matching was between 94.2 and 100%. The backward mapping was better than the forward one, especially regarding exact matches. In both cases, the NLP step was highly accurate. When there are no available experts from the ontology or NLP fields for multi-lingual ontology matching, this simple approach enables secondary reuse of Electronic Health Records (EHR) and billing data for research purposes in an international context.
The acquisition of medical images from multiple medial institutions has become important for high-quality clinical studies. In recent years, electronic data submission has enabled the transmission of image data to independent institutions more quickly and easily than before. However, the selection, anonymization, and transmission of medical images still require human resources in the form of clinical research collaborators. In this study, we developed an image collection system that works with the electronic data capture (EDC) system. In this image collection system, medical images are selected based on EDC input information, patient ID is anonymized to a subject ID issued by the EDC, and the selected anonymized images are transferred to the research institute without human intervention. In the research institute, clinical information registered by the EDC and clinical images collected by the image collection system are managed by the same subject ID and can be used for clinical studies. In October 2019, our image collection system was introduced to 13 medical institutions and has now begun collecting medical images from the in-hospital picture archiving and communication system (PACS) of those institutions.
We here describe the evolution of annotation guidelines for major clinical named entities, namely Diagnosis, Findings and Symptoms, on a corpus of approximately 1,000 German discharge letters. Due to their intrinsic opaqueness and complexity, clinical annotation tasks require continuous guideline tuning, beginning from the initial definition of crucial entities and the subsequent iterative evolution of guidelines based on empirical evidence. We describe rationales for adaptation, with focus on several metrical criteria and task-centered clinical constraints
Development of vascular collaterals in a lesion area is one of the key factors that determine not only the choice of treatment for ischemic stroke (IS) patients, but also outcome and therapy effectiveness. The main method for examining the vessels’ ramification is CT angiography (CTA). CTA analysis may be improved by incorporating filters designed to extract more features about vessels and quantify their level of development. This work suggests the usage of radiomics methods in the analysis of vesselness measure calculated from CTA images. Vesselness measurement is based on the analysis of the Hessian matrix with a few modifications dictated by practical aspects of this issue. The developed algorithm was implemented as a filter that generates a new 3D image, every voxel of which has the probability of belonging to a vessel-like structure. Further analysis of the distribution of vesselness in the lesion area and in the intact contralateral area was conducted with the methods from the open library PyRadiomics. A set of radiomics features was calculated. Preliminary analysis on a sample of 30 IS patients showed the presence of significant differences between afflicted and intact hemispheres.
Nursing Minimum Data Sets (NMDS) intend to systematically describe nursing care. Until now NMDS have been populated with nursing data by manual data ascertainment which is inefficient. The objective of this work was to evaluate an automated mapping pipeline for transforming nursing data into an NMDS. We used LEP Nursing 3 data as source data and the Austrian and German NMDS as target formats. Based on a human expert mapping between LEP and NMDS, an automated data mapping algorithm was developed and implemented in an automatic mapping pipeline. The results show that most LEP nursing interventions can be matched to the NMDS-AT and G-NMDS and that a fully automated mapping process from LEP Nursing 3 data to NMDS-AT performs effectively and very efficiently. The shown approach can also be used to map different nursing classifications and to automatically transform point-of-care nursing data into nursing minimum data sets.
The main goal of this paper is to develop a spell checker module for clinical text in Russian. The described approach combines string distance measure algorithms with technics of machine learning embedding methods. Our overall precision is 0.86, lexical precision – 0.975 and error precision is 0.74. We develop spell checker as a part of medical text mining tool regarding the problems of misspelling, negation, experiencer and temporality detection.
Adverse drug reactions (ADRs) are frequent and associated to significant morbidity, mortality and costs. Therefore, their early detection in the hospital context is vital. Automatic tools could be developed taking into account structured and textual data. In this paper, we present the methodology followed for the manual annotation and automatic classification of discharge letters from a tertiary hospital. The results show that ADRs and causal drugs are explicitly mentioned in the discharge letters and that machine learning algorithms are efficient for the automatic detection of documents containing mentions of ADRs.
Studies have shown that mental health and comorbidities such as dementia, diabetes and cardiovascular diseases are risk factors for dialysis patients. Extracting accurate and timely information associated with these risk factors in the patient health records is not only important for dialysis patient management, but also for real-world evidence generation. We presented HERALD, an natural language processing (NLP) system for extracting information related to risk factors of dialysis patients from free-text progress notes in an electronic dialysis patient management system. By converting semi-structured notes into complete sentences before feeding them into the NLP module, the HERALD system was able achieved 99%, 83% and 80% accuracy in identifying dementia, diabetes and infarction, respectively.
Radiology reports describe the findings of a radiologist in an imaging examination, produced for another clinician in order to answer to a clinical indication. Sometimes, the report does not fully answer the question asked, despite guidelines for the radiologist. In this article, a system that controls the quality of reports automatically is described. It notably maps the free text onto MeSH terms and checks if the anatomy and disease terms match in the indication and conclusion of a report. The agreement between manual checks of experienced radiologists and the system is high with automatic checks requiring only a fraction of time. Being able to quality control all reports has the potential to improve report quality and thus limit misunderstandings, loosing time for requesting more information and possibly avoid medical mistakes.
Drugs information systems, prescription support softwares, and drug decision support systems need to reason on drug properties. Combined pharmaceutical products need to be considered specifically because they may require a specific processing. Hence, they also need to be identified to automate the population of databases with up-to-date property values. We defined a set of digital filters designed for the identification of antibiotics in a public database. Four different filters are proposed, to be combined to extract the relevant information. Evaluation was conducted to combine filters and retrieve information about rand combined antibiotics with success. However, information provided in the structured files of the French drug database is limited; information provided in the HTML files suffers from a lack of quality. Hence, reuse of this data and this information should be performed very cautiously.
Modern biomedical research is increasingly data-driven. To create the required big datasets, health data needs to be shared or reused, which often leads to privacy challenges. Data anonymization is an important protection method where data is transformed such that privacy guarantees can be provided according to formal models. For applications in practice, anonymization methods need to be integrated into scalable and reliable tools. In this work, we tackle the problem of achieving reliability. Privacy models often involve mathematical definitions using real numbers which are typically approximated using floating-point numbers when implemented as software. We study the effect on the privacy guarantees provided and present a reliable computing framework based on fractional and interval arithmetic for improving the reliability of implementations. Extensive evaluations demonstrate that reliable data anonymization is practical and that it can be achieved with minor impacts on executions times and data utility.
Blood lactate concentration is a reliable risk indicator of deterioration in critical care requiring frequent blood sampling. However, lactate measurement is an invasive procedure that can increase risk of infections. Yet there is no clinical consensus on the frequency of measurements. In response we investigate whether machine learning algorithms can be used to predict blood lactate concentration from ICU health records. We evaluate the performance of different prediction algorithms using a multi-centre critical care dataset containing 13,464 patients. Furthermore, we analyse impact of missing value handling methods in prediction performance for each algorithm. Our experimental analysis show promising results, establishing a baseline for further investigation into this problem.
The present work provides a real-world case of the connection process of a hospital, 12 de Octubre University Hospital in Spain, to the TriNetX research network, transforming a compilation of disparate sources into a single harmonized repository which is automatically refreshed every day. It describes the different integration phases: terminology core datasets, specialized sources and eventually automatic refreshment. It also explains the work performed on semantic normalization of the involved clinical terminologies; as well as the resulting benefits the InSite platform services have enabled in the form of research opportunities for the hospital.
Word embeddings have become the predominant representation scheme on a token-level for various clinical natural language processing (NLP) tasks. More recently, character-level neural language models, exploiting recurrent neural networks, have again received attention, because they achieved similar performance against various NLP benchmarks. We investigated to what extent character-based language models can be applied to the clinical domain and whether they are able to capture reasonable lexical semantics using this maximally fine-grained representation scheme. We trained a long short-term memory network on an excerpt from a table of de-identified 50-character long problem list entries in German, each of which assigned to an ICD-10 code. We modelled the task as a time series of one-hot encoded single character inputs. After the training phase we accessed the top 10 most similar character-induced word embeddings related to a clinical concept via a nearest neighbour search and evaluated the expected interconnected semantics. Results showed that traceable semantics were captured on a syntactic level above single characters, addressing the idiosyncratic nature of clinical language. The results support recent work on general language modelling that raised the question whether token-based representation schemes are still necessary for specific NLP tasks.
The objective of this study is to develop a method for clinical abbreviation disambiguation using deep contextualized representation and cluster analysis. We employed the pre-trained BioELMo language model to generate the contextualized word vector for abbreviations within each instance. Then principal component analysis was conducted on word vectors to reduce the dimension. K-Means cluster analysis was conducted for each abbreviation and the sense for a cluster was assigned based on the majority vote of annotations. Our method achieved an average accuracy of around 95% in 74 abbreviations. Simulation showed that each cluster required the annotation of 5 samples to determine its sense.
Electronic health records contain valuable information on patients’ clinical history in the form of free text. Manually analyzing millions of these documents is unfeasible and automatic natural language processing methods are essential for efficiently exploiting these data. Within this, normalization of clinical entities, where the aim is to link entity mentions to reference vocabularies, is of utmost importance to successfully extract knowledge from clinical narratives. In this paper we present sieve-based models combined with heuristics and word embeddings and present results of our participation in the 2019 n2c2 (National NLP Clinical Challenges) shared-task on clinical concept normalization.
Chronic fatigue syndrome (CFS) is a long-term illness with a wide range of symptoms and condition trajectories. To improve the understanding of these, automated analysis of large amounts of patient data holds promise. Routinely documented assessments are useful for large-scale analysis, however relevant information is mainly in free text. As a first step to extract symptom and condition trajectories, natural language processing (NLP) methods are useful to identify important textual content and relevant information. In this paper, we propose an agnostic NLP method of extracting segments of patients’ clinical histories in CFS assessments. Moreover, we present initial results on the advantage of using these segments to quantify and analyse the presence of certain clinically relevant concepts.
The ever-growing use of information and communication technologies in the past decades and the proliferation of mobile devices for monitoring vital signs and physical activity is enhancing the emergence of a new healthcare paradigm. More recently, citizens are becoming more sensible to the necessity of monitoring environmental health indicators and to its direct impact on personal health. This article proposes and describes the development of a clinico-environmental system for personal monitoring. The result is ContinuousCare, a personal healthcare information system that integrates personal smart devices with air quality monitors. The solution helps citizens to better understand their health and body activity with environmental context, aiding professional doctors with analysis tools and making available valuable data for external systems.