Ebook: MEDINFO 2019: Health and Wellbeing e-Networks for All
Combining and integrating cross-institutional data remains a challenge for both researchers and those involved in patient care. Patient-generated data can contribute precious information to healthcare professionals by enabling monitoring under normal life conditions and also helping patients play a more active role in their own care.
This book presents the proceedings of MEDINFO 2019, the 17th World Congress on Medical and Health Informatics, held in Lyon, France, from 25 to 30 August 2019. The theme of this year’s conference was ‘Health and Wellbeing: E-Networks for All’, stressing the increasing importance of networks in healthcare on the one hand, and the patient-centered perspective on the other. Over 1100 manuscripts were submitted to the conference and, after a thorough review process by at least three reviewers and assessment by a scientific program committee member, 285 papers and 296 posters were accepted, together with 47 podium abstracts, 7 demonstrations, 45 panels, 21 workshops and 9 tutorials. All accepted paper and poster contributions are included in these proceedings. The papers are grouped under four thematic tracks: interpreting health and biomedical data, supporting care delivery, enabling precision medicine and public health, and the human element in medical informatics. The posters are divided into the same four groups.
The book presents an overview of state-of-the-art informatics projects from multiple regions of the world; it will be of interest to anyone working in the field of medical informatics.
The Proceedings of MEDINFO 2019 (the 17th World Congress of Medical and Health Informatics located in Lyon, France) illustrate how informatics scholars from all over the world are pursuing projects related to the theme of Health and Wellbeing e-Networks for All. Throughout this publication, readers will find innovative approaches to the collection, organization, analysis, and sharing of data and knowledge related to health and wellbeing. The articles in these proceedings not only document the state-of-the-art in our field worldwide, but also remind readers of how they can build on past discoveries and get motivated to pursue new paths in the future.
Every two years the medical and health informatics community assembles to discuss the latest findings in informatics, to meet old friends, and to make new ones. New generations come together to showcase their research, hear from their peers as well as from those who contribute to their training, and start to pave their way to be our future leaders. Those of us who have been attending these meetings for a long time continue to be amazed by the energy of newcomers, the wisdom and determination of informatics pioneers, the drive of foundational and practice-oriented informaticians, and the incredible amount of work that it takes to organize this conference.
These Proceedings are a small but critical part of what MEDINFO is all about. They feature articles and abstracts describing research, training, and service functions that informaticians all over the world are designing, implementing, and evaluating. They represent the wide range of informatics development in different regions. The way these Proceedings are edited is emblematic of our field: the collaboration needed to produce a top-quality publication, even in the absence of a dedicated staff or a large budget, is only possible because reaching our destination is more important than the few barriers that present themselves along the way. The Proceedings are the output of a large group of volunteers who receive a small token of appreciation and are rewarded by knowing that, because of their contributions, authors disseminate their work beyond regional boundaries and readers within and outside our field can learn and reuse tools, data, and knowledge.
It was our great pleasure and honor to work with Associate Editors Todd Lingren and Scott McGrath to produce these Proceedings. They coordinated a large group of informatics trainees from multiple institutions (Table 1) in order to ensure that the contents were easy to read and formatted uniformly. We thank the whole editorial team as well as the Scientific Program team for making the editing role a fun and enjoyable one. And we thank our readers for understanding the difficulties in editing a large number of accepted papers for clarity, grammar, and style in such a short amount of time.
These Proceedings feature state-of-the-art informatics projects from multiple regions of the world. Enjoy learning how informatics is changing the way we approach health and wellbeing for all.
Table 1 – Editorial Committee Members
Marcy Antonio / University of Arizona, USA
Jacqueline Brixey / University of Southern California, USA
Melissa M. Das / MaineHealth, USA
Smruti Deoghare / University of Cincinnati, USA
Grace Gao / University of Minnesota, USA
Mattias Georgsson / Blekinge Institute of Technology, Sweden
Zhe He / Florida State University, USA
Renee Marie Hendricks / Veterans Health Administration, USA
Felix Holl / Ulm University, Germany
Kate Fultz Hollis / Oregon Health & Science University, USA
Hyunggu Jung / University of Washington, USA
Tian Kang / Columbia University, USA
Emily Kawaler / New York University, USA
David Leander / Dartmouth Geisel School of Medicine, USA
Tiffany I. Leung / Maastricht University, The Netherlands
Lacey Lewis / Veterans Health Administration, USA
Elizabeth A. Lindemann / University of Minnesota, USA
Todd Lingren / University of Cincinnati, USA
Satish M. Mahajan / Veterans Health Administration, USA
Vincent Major / New York University, USA
Scott McGrath / University of Nebraska Omaha, USA
Melanie Meyer / University of Massachusetts, USA
Sean Mikles / University of North Carolina Chapel Hill, USA
Elliot Mitchell / Columbia University, USA
Vickie Nguyen / University of Texas Health Science Center at Houston, USA
Florence Odekunle / Rutgers University, USA
Adebowale Ojo / Center for Disease Control and Prevention, USA
Lisiane Pruinelli / University of Minnesota, USA
Satyajeet Raje / IBM, USA
Doug Redd / George Washington University, USA
Lincoln Sheets / University of Missouri, USA
Bryan Steitz / Vanderbilt University, USA
Vignesh Subbian / University of Arizona, USA
Lina Sulieman / Vanderbilt University, USA
Andreas Triantafyllidis / Centre for Research and Technology Hellas (CERTH), Greece
Elizabeth Umberfield / University of Michigan, USA
Jacob P. VanHouten / Vanderbilt University Medical Center, USA
Lois Walters-Threat / American Nurses Credentialing Center, USA
Wei Wei / University of Pittsburgh Medical Center, USA
Rafeek Adeyemi Yusuf / University of Texas Health Science Center at Houston, USA
Ling Zheng / Monmouth University, USA
Lucila Ohno-Machado, MD, PhD
Brigitte Seroussi, MD, PhD
Co-Chairs, MedInfo2019 Editorial Committee
Eliciting semantic similarity between concepts remains a challenging task. Recent approaches founded on embedding vectors have gained in popularity as they have risen to efficiently capture semantic relationships. The underlying idea is that two words that have close meaning gather similar contexts. In this study, we propose a new neural network model, named MeSH-gram, which relies on a straightforward approach that extends the skip-gram neural network model by considering MeSH (Medical Subject Headings) descriptors instead of words. Trained on publicly available PubMed/MEDLINE corpus, MeSH-gram is evaluated on reference standards manually annotated for semantic similarity. MeSH-gram is first compared to skip-gram with vectors of size 300 and at several windows’ contexts. A deeper comparison is performed with twenty existing models. All the obtained results with Spearman’s rank correlations between human scores and computed similarities show that MeSH-gram (i) outperforms the skip-gram model and (ii) is comparable to the best methods that need more computation and external resources.
Kidney transplantation is recommended for patients with End-Stage Renal Disease (ESRD). However, complications, such as graft rejection are hard to predict due to donor and recipient variability. This study discusses the role of machine learning (ML) in predicting graft rejection following kidney transplantation, by reviewing the available related literature. PubMed, DBLP, and Scopus databases were searched to identify studies that utilized ML methods, in predicting outcome following kidney transplants. Fourteen studies were included. This study reviewed the deployment of ML in 109,317 kidney transplant patients from 14 studies. We extracted five different ML algorithms from reviewed studies. Decision Tree (DT) algorithms revealed slightly higher performance with overall mean Area Under the Curve (AUC) for DT (79.5% ± 0.06) was higher than Artificial Neural Network (ANN) (78.2% ± 0.08). For predicting graft rejection, ANN and DT were at the top among ML models that had higher accuracy and AUC.
Hospital systems frequently implement quality measures to quantify healthcare processes and patient outcomes. One such measure that has previously been used is the Surgical Care Improvement Project (SCIP) quality measure of perioperative beta blocker continuation, SCIP-Card-2. The SCIP-Card-2 measure requires resource-intensive medical chart abstraction, limiting its application to a small sample of eligible patients. This paper describes a natural language processing (NLP) system for automatic extraction of SCIP-Card-2 quality measures in clinical text notes.
Supported by the European Commission under Horizon 2020, mHealth4Afrika is co-designing and validating a modular, multilingual, state-of-the-art health information system addressing primary healthcare requirements in resource constrained environments. mHealth4Afrika has co-designed a comprehensive range of functionality and medical programs in partnership with Ministries of Health, district health officers, clinic managers and primary healthcare workers from urban, rural and deep rural health facilities in Ethiopia, Kenya, Malawi and South Africa. This paper provides insights into how mHealth4Afrika is leveraging HL7 FHIR to support standards-based data exchange and interoperability between Electronic Medical Records and DHIS2. This work is currently being validated in the field.
This paper addresses the task of answering consumer health questions about medications. To better understand the challenge and needs in terms of methods and resources, we first introduce a gold standard corpus for Medication Question Answering created using real consumer questions. The gold standard (https://github.com/abachaa/Medication_QA_MedInfo2019) consists of six hundred and seventy-four question-answer pairs with annotations of the question focus and type and the answer source. We first present the manual annotation and answering process. In the second part of this paper, we test the performance of recurrent and convolutional neural networks in question type identification and focus recognition. Finally, we discuss the research insights from both the dataset creation process and our experiments. This study provides new resources and experiments on answering consumers’ medication questions and discusses the limitations and directions for future research efforts.
Non-compliance situations happen when patients do not follow their prescriptions and take actions that lead to potentially harmful situations. Although such situations are dangerous, patients usually do not report them to their physicians. Hence, it is necessary to study other sources of information. We propose to study online health fora. The purpose of our work is to explore online health fora with supervised classification and information retrieval methods in order to identify messages that contain drug non-compliance. The supervised classification method permits detection of non-compliance with up to 0.824 F-measure, while the information retrieval method permits detection non-compliance with up to 0.529 F-measure. For some fine-grained categories and new data, it shows up to 0.65–0.70 Precision.
Clinical research studies often leverage various heterogeneous data sources including patient electronic health record, online survey, and genomic data. We introduce a graph-based, data integration and query tool called Carnival. We demonstrate its powerful ability to unify data from these disparate data sources to create datasets for two studies: prevalence and incidence case/control matches for coronary artery disease and controls for Marfan syndrome. We conclude with future directions for Carnival development.
Assessing a patient’s risk of an impending suicide attempt has been hampered by limited information about dynamic factors that change rapidly in the days leading up to an attempt. The storage of patient data in electronic health records (EHRs) has facilitated population-level risk assessment studies using machine learning techniques. Until recently, most such work has used only structured EHR data and excluded the unstructured text of clinical notes. In this article, we describe our experiments on suicide risk assessment, modelling the problem as a classification task. Given the wealth of text data in mental health EHRs, we aimed to assess the impact of using this data in distinguishing periods prior to a suicide attempt from those not preceding such an attempt. We compare three different feature sets, one structured and two text-based, and show that inclusion of text features significantly improves classification accuracy in suicide risk assessment.
The aim of the study was to build a proof-of-concept demonstratrating that big data technology could improve drug safety monitoring in a hospital and could help pharmacovigilance professionals to make data-driven targeted hypotheses on adverse drug events (ADEs) due to drug-drug interactions (DDI). We developed a DDI automatic detection system based on treatment data and laboratory tests from the electronic health records stored in the clinical data warehouse of Rennes academic hospital. We also used OrientDb, a graph database to store informations from five drug knowledge databases and Spark to perform analysis of potential interactions betweens drugs taken by hospitalized patients. Then, we developed a machine learning model to identify the patients in whom an ADE might have occurred because of a DDI. The DDI detection system worked efficiently and computation time was manageable. The system could be routinely employed for monitoring.
Suicide is a growing public health concern in online communities. In this paper, we analyze online communications on the topic of suicide in the social networking platform, Reddit. We combine lexical text characteristics with semantic information to identify comments with features of suicide attempts and methods. Then, we develop a set of machine learning methods to automatically extract suicide methods and classify the user comments. Our classification methods performance varied between suicide experiences, with F1-scores up to 0.92 for “drugs” and greater than 0.82 for “hanging” and “other methods”. Our exploratory analysis reveals that the most frequent reported suicide methods are drug overdose, hanging, and wrist-cutting.
When dealing with electrocardiography (ECG) the main focus relies on the classification of the heart’s electric activity and deep learning has been proving its value over the years classifying the heartbeats, exhibiting great performance when doing so. Following these assumptions, we propose a deep learning model based on a ResNet architecture with convolutional 1D layers to classify the beats into one of the 4 classes: normal, atrial premature contraction, premature ventricular contraction and others. Experimental results with MIT-BIH Arrhythmia Database confirmed that the model is able to perform well, obtaining an accuracy of 96% when using stochastic gradient descent (SGD) and 83% when using adaptive moment estimation (Adam), SGD also obtained F1-scores over 90% for the four classes proposed. A larger dataset was created and tested as unforeseen data for the trained model, proving that new tests should be done to improve the accuracy of it.
We report initial experiments for analyzing social media through an NLP annotation tool on web posts about medications of current interests (baclofen, levothyroxine and vaccines) and summaries of product characteristics (SPCs). We conducted supervised experiments on a subset of messages annotated by experts according to positive or negative misuse; results ranged from 0.62 to 0.91 of F-score. We also annotated both SPCs and another set of posts to compare MedDRA annotations in each source. A pharmacovigilance expert checked the output and confirmed that entities not found in SCPs might express drug misuse or unknown ADRs.
A method is described to use SNOMED CT’s history mechanism as a means to compute how the formal and linguistic intensions of its concepts change over versions. As a result of this, it is demonstrated that the intended principle of concept permanence is not always adhered to. It is shown that the evolution of formal intensions can be monitored fully automatically and that the proposed procedure includes a method to suggest missing subsumers in a concept’s transitive closure set by identifying mistakes that have been made in the past. Changes in linguistic intensions were found to be much more labor-intensive to identify. It is suggested that this could be improved if the history mechanism would come with more detailed motivations for change than the current and insufficiently used annotation to the effect that a fully specified name ‘fails to comply with the current editorial guidance’.
Unstructured electronic health records are valuable resources for research. Before they are shared with researchers, protected health information needs to be removed from these unstructured documents to protect patient privacy. The main steps involved in removing protected health information are accurately identifying sensitive information in the documents and removing the identified information. To keep the documents as realistic as possible, the step of omitting sensitive information is often followed by replacement of identified sensitive information with surrogates. In this study, we present an algorithm to generate surrogates for unstructured electronic health records. We used this algorithm to generate realistic surrogates on a Health Science Alliance corpus, which is constructed specifically for the use of development of automated de-identification systems.
Personalized medicine implies reducing invasiveness of therapeutic procedures. Although interventional radiology proved a very interesting alternative to surgical procedures, it still raises concerns due to the irradiation dose received by the medical team (and by the patient). We propose a novel concept allowing to reduce very significantly the irradiation dose during the phases where tools inserted in the patient have to be tracked with respect to previously acquired images. This implies inserting a miniaturized X-ray detector in the tip of the tools, and reducing the dose by a “rotating collimator”. We demonstrate that real-time processing of the signals allows accurate localization of the tip of the tools, with a dose reduction of at least ten times.
The W3C project, “Linking Open Drug Data” (LODD), linked several publicly available sources of drug data together. So far, French data, like marketed drugs and their summary of product characteristics, were not integrated and remained difficult to query. In this paper, we present Romedi (Référentiel Ouvert du Médicament), an open dataset that links French data on drugs to international resources. The principles and standard recommendations created by the W3C for sharing information were adopted. Romedi was connected to the Unified Medical Language System and DrugBank, two central resources of the LODD project. A SPARQL endpoint is available to query Romedi and services are provided to annotate textual content with Romedi terms. This paper describes its content, its services, its links to external resources, and expected future developments.
Semantic standards and human language technologies are key enablers for semantic interoperability across heterogeneous document and data collections in clinical information systems. Data provenance is awarded increasing attention, and it is especially critical where clinical data are automatically extracted from original documents, e.g. by text mining. This paper demonstrates how the output of a commercial clinical text-mining tool can be harmonised with FHIR, the leading clinical information model standard. Character ranges that indicate the origin of an annotation and machine generates confidence values were identified as crucial elements of data provenance in order to enrich text-mining results. We have specified and requested necessary extensions to the FHIR standard and demonstrated how, as a result, important metadata describing processes generating FHIR instances from clinical narratives can be embedded.
Metadata matching is an important step towards integrating heterogeneous healthcare data and facilitating secondary use. MDRCupid supports this step by providing a configurable metadata matching toolbox incorporating lexical and statistical matching approaches. The matching configuration can be adapted to different purposes by manually selecting algorithms and their weights or by using the optimization module with corresponding training data. The toolbox can be accessed as a web service via programming or user interface. For every selected metadata element, the metadata elements with the highest similarity scores are presented to the user and can be manually confirmed via the user interface, while the programming interface uses a similarity threshold to select corresponding elements. An HL7 FHIR ConceptMap is used to save the matches. Manually confirmed matches may be used as new training data for the optimizer to improve the matching parameters further.
Media outlets play crucial roles in disseminating health information. Previous studies have examined how health journalism is practiced by reliable and unreliable media outlets. However, most of the existing works are conducted over a relatively small set of samples. In this study, we investigate a large collection (about 30 thousand) of health-related news articles which were published by 29 reliable and 20 unreliable media outlets and identify several differences in health journalism practice. Our analysis shows that there are significant structural, topical, and semantic disparities in the way reliable and unreliable media outlets conduct health journalism. We argue, in this age of ‘fake news’, these findings will be useful to combat online health disinformation.
With the growing interdisciplinarity of cancer treatment and increasing amounts of data and patients, it is getting increasingly difficult for physicians to capture a patient’s medical history as a basis for adequate treatment and to compare different medical histories of similar patients to each other. Furthermore, in order to tackle the etiological mechanisms of cancer, it is crucial to identify patients exhibiting a different disease course than their corresponding cohort. Several timeline visualizations have already been proposed. However, the functions and design of such visualizations are always use case dependent. We constructed a cohort timeline prototype mock-up for a specific oncological use case involving multiple myeloma, where the chronological monitoring of various parameters is crucial for patient diagnosis and treatment. Our proposed cohort timeline is a synthesis between elements described in the literature and our own approaches regarding function and design.
A significant part of medical knowledge is stored as unstructured free text. However, clinical narratives are known to contain duplicated sections due to clinicians’ copy/paste parts of a former report into a new one. In this study, we aim at evaluating the duplications found within patient records in more than 650,000 French clinical narratives. We adapted a method to identify efficiently duplicated zones in a reasonable time. We evaluated the potential impact of duplications in two use cases: the presence of (i) treatments and/or (ii) relative dates. We identified an average rate of duplication of 33%. We found that 20% of the document contained drugs mentioned only in duplicated zones and that 1.45% of the document contained mentions of relative dates in duplicated zone, that could potentially lead to erroneous interpretation. We suggest the systematic identification and annotation of duplicated zones in clinical narratives for information extraction and temporal-oriented tasks.
With the proliferation of digital communication in healthcare, the reuse of laboratory test data entails valuable insights into clinical and scientific issues, basically enabled by semantic standardization using the LOINC coding system. In order to extend the currently limited potential for analysis, which is mainly caused by structural peculiarities of LOINC, an algorithmic transformation of relevant content into an OWL ontology was performed, which includes LOINC Terms, Parts and Hierarchies. For extending analysis capabilities, the comprehensive SNOMED CT ontology is added by transferring its contents and the recently published LOINC-related mapping data into OWL ontologies.
These formalizations offer rich, computer-processable content and allow to infer additional structures and relationships, especially when used together. Consequently, various reutilizations are facilitated; an application demonstrating the dynamic visualization of fractional hierarchy structures for user-supplied laboratory data was already implemented. By providing element-wise aggregation via superclasses, an adaptable, graph representation is obtained for studying categorizations.