Ebook: Decision Support Systems and Education
Medical informatics has revolutionized healthcare in recent years, and one of the major challenges now faced by health professionals everywhere is the further improvement of healthcare by making more effective use of the data from biomedical informatics, not least for education and decision support.
This book presents the 52 full papers (accepted from 95 initial submissions) delivered at the Special Topic Conference of the European Federation for Medical Informatics (EFMI STC 2018), held in Zagreb, Croatia, on 15 and 16 October 2018. The EFMI STC is one of Europe`s leading conferences for the sharing of current professional and scientific knowledge in health informatics processes, and the topics covered here have been broadly divided into two sections; decision support and education.
Offering an overview of current medical informatics research, this book will undoubtedly prove invaluable for the professional development of healthcare practitioners, as well as contributing to knowledge sustainability within the field of medical informatics.
The current volume presents accepted full papers from the Special Topic Conference of the European Federation for Medical Informatics (EFMI STC 2018), held from 15 to 16 October 2018 in Zagreb, Croatia. By the assigned deadline, we had received 95 submissions from which, after review, SPC accepted 52 as full papers to be included in this volume of proceedings. The Scientific Programme Committee (SPC) would like to present these scientific outcomes from the EFMI STC 2018 Conference to the academic community.
EFMI STC 2018 is the latest annual conference in the series of special topic conferences organized by EFMI, and focusing each year on a specific topic or topics of interest to the biomedical and health informatics community. The conference focuses on improving healthcare by means of decision support systems which utilize data from biomedical informatics implementations across the entire spectrum: from clinical informatics and health informatics to public health informatics as applied in the healthcare domain. Decision making is also of importance in the area of health management, where organizational issues can play an important role in the implementation of biomedical and health informatics applications. An additional topic of this dual-spread conference is the educational aspect of biomedical and health informatics. Curriculum development, educational implementations, recommendations, evaluation, and accreditation are discussed, and professional development and skills certification processes are presented. We treat the field of biomedical informatics in a very broad framework, examining the implications of decision making in the clinical domain and in health management, while providing a bridge to the educational processes to ensure knowledge sustainability and improved professional development for all those working in the field of healthcare. Data, informatics, decision making and education foster and empower health professionals and informaticians, enabling them to improve healthcare for the benefit of patients. The EFMI Working Groups engaged for this content are: EDU (Education) and IDeS (Information and Decision Support in Biomedicine and Health Care).
This volume incorporates only the full papers accepted for oral presentation at the conference. It should be noted that the proceedings are published in the internationally indexed series of Studies in Health Technology and Informatics (SHTI) of IOS Press.
The editors would like to thank the members of the SPC, the local organizing committee, and, in particular, all reviewers, who performed in an outstandingly professional and objective way throughout the process of refereeing the submitted scientific work, producing a high-quality publication for a successful scientific event.
Zagreb, 30 August 2018.
The Editors:
John Mantas
Zdenko Sonicki
Mihaela Crişan-Vida
Kristina Fišter
Maria Hägglund
Aikaterini Kolokathi
Mira Hercigonja-Szekeres
Systematic reviews are widely used as a tool for decision making to establish new clinical guidelines. Reviews can be time-consuming, potentially leaving authors with thousands of citations to screen. Software tools for assisting reviewers in this process are available, however, only few use text mining techniques to reduce screening time. In this work, we introduce Twister, a web-based tool for semi-automated literature reviews with broad research questions. We discuss how two text mining techniques can be used to (a) extract data elements from clinical abstracts and (b) how citations can be clustered based on a key phrase-extraction to help reviewers reduce screening time. We present the overall system architecture, design consideration and system implementation.
Diagnoses recorded on the problem list are increasingly being used for decision support applications. To obtain insight in the adequacy of the clinical user interface to capture what the clinician has in mind, and to reconstruct the clinical reality of the patient, we analyzed in the database of an EHR system the transactions that resulted from managing the problem list. Our findings indicate (1) that caution is required when using the evolution of the problem list for determining comorbidity or ongoing disease, and (2) that similarities or differences in problem list annotation sequences do not always correspond with similarities resp. differences in disease courses. It is to be investigated whether automatically identifiable subsets of problem list evolution patterns exist from which ground truth reliably can be inferred or whether clinicians need more education in how problem list user interfaces should be used to avoid erroneous interpretations by clinical decision support applications.
Secondary use of clinical structured data takes an important place in healthcare research. It was first described by Fayyad as “knowledge discovery in databases”. Feature extraction is an important phase but received little attention. The objectives of this paper are: 1) to propose an updated representation of data reuse in healthcare, 2) to illustrate methods and objectives of feature extraction, and 3) to discuss the place of domain-specific knowledge.
Material and methods: an updated representation is proposed. Then, a case study consists of automatically identifying acute renal failure and discovering risk factors, by secondary use of structured data. Finally, a literature review published par Meystre et al. is analyzed.
Results: 1) we propose a description of data reuse in 5 phases. Phase 1 is data preprocessing (cleansing, linkage, terminological alignment, unit conversions, deidentification), it enables to construct a data warehouse. Phase 2 is feature extraction. Phase 3 is statistical and graphical mining. Phase 4 consists of expert filtering and reorganization of statistical results. Phase 5 is decision making. 2) The case study illustrates how time-dependent features can be extracted from laboratory results and drug administrations, using domain-specific knowledge. 3) Among the 200 papers cited by Meystre et al., the first and last authors were affiliated to health institutions in 74% (68% for methodological papers, and 79% for applied papers).
Discussion: features extraction has a major impact on success of data reuse. Specific knowledge-based reasoning takes an important place in feature extraction, which requires tight collaboration between computer scientists, statisticians, and health professionals.
Background: Unstructured health documents (e.g. discharge summaries) represent an important and unavoidable source of information.
Methods: A semantic annotator identified all the concepts present in the health documents from the clinical data warehouse of the Rouen University Hospital.
Results: 2,087,784,055 annotations were generated from a corpus of about 11.9 million documents with an average of 175 annotations per document. SNOMED CT, NCIt and MeSH were the top 3 terminologies that reported the most annotation.
Discussion: As expected, the most general terminologies with the most translated concepts were those with the most concepts identified.
Many epidemiological studies now rely on the reuse of large healthcare administrative databases. In those studies, most of the time is consumed in managing data and performing basic statistical analyses and is not available anymore for complex statistical and medical analysis, therefore the potential of such databases is sometimes underexploited. The objective of this work is to build SAF4SUHAD, a statistical analysis framework for secondary use of healthcare administrative databases, using literature-based specifications. A literature review was performed on PubMed in four different medical domains: caesarian deliveries, cholecystectomies, hip replacement surgeries and bariatric surgeries. We identified 22 papers relating analyses of large databases. They reported epidemiological indicators (e.g. mean age), that were abstracted to features (e.g. univariate description of a quantitative variable), and then were implemented through 32 functions available for the user in R programming language. For instance, a function will draw a histogram, compute the mean with confidence interval, quantiles, etc. Those functions comprehend 4 functions for data management, 9 for univariate analysis, 8 for bivariate analysis, 11 for multivariate analysis, and many other intermediate functions. Those functions were successfully used to analyze a French database of 250 million discharge summaries. The set of R ready-to-use functions defined in this work could enable to secure repetitive tasks, and to refocus efforts on expert analysis.
Managing multimorbidity entails processing distributed, dynamic and heterogeneous data using diverse analytics tools. We present KITE, a Cloud-based infrastructure allowing the aggregation and processing of health data using a dynamic set of analytical components. We showcase KITE in the context of the ProACT project, aiming at advancing home-based integrated care though IoT, analytics and a behavior change framework. We validate the viability of the infrastructure through an application of Bayesian networks to give a probabilistic representation of older individuals based on a variety of factors.
This paper describes a secure data collection infrastructure involving standardized electronic medical record (EMR) storage and Private Set Intersection, a secure data collection technology based on Bloom filter. The objective of this infrastructure is to facilitate rapid secondary use of exported EMR data in cross-patient or cross-institutional analyses based on the Standardized Structured Medical Information eXchange (SS-MIX), Japan's domestic standard for EMR exporting. Design of the infrastructure and its underlying concepts are described herein. In an experimental test, an intersection operation involving approximately 1 million records was completed within a minute; this result is expected to be representative of the system in actual use. In forthcoming work, we plan to verify the system performance using larger data sets.
Unplanned hospital readmissions are a burden to the healthcare system and to the patients. To lower the readmission rates, machine learning approaches can be used to create predictive models, with the intention to provide actionable information for caregivers. According to the German Diagnosis Related Groups (G-DRG) system, for every stay in a German hospital, data are collected for the subsequent reimbursement calculations. After statistical evaluation, these data are summarised in the yearly updated Case Fee Catalogue, which not only contains the weights for the reimbursement calculations, but also the expected length of stay values. The aim of the present paper was to evaluate potential enhancements of the prediction accuracy of our 30-day readmission prediction model by utilising additional information from the Case Fee Catalogue. A bagged ensemble of 25 regression trees was applied to §21 datasets from five independent German hospitals from 2013 to 2017, resulting in 422,597 cases. The overall model showed an area under the receiver operating characteristics curve of 0.812. Three of the top five features ranked by out of bag feature importance emerged from the Case Fee Catalogue. We conclude, that additional information from the Case Fee Catalogue can enhance the accuracy of 30-day readmission prediction.
Standards Data Warehouse has been implemented in many hospitals. It has enormous potential to improve performance measurement and health care quality. Accessing, organizing, and using these data to optimize clinical coding are evolving challenges for hospital systems. This paper describes development of a coding data warehouse based Entities-Attribute-Value (EAV) that we created by importing data from the clinical data warehouse (CDW) of public hospital. In particular, it focuses on design, implementation, and evaluation of the warehouse. Moreover, it defines the rules to convert a conceptual model of coding into an EAV logical model and his implementation using integrating biology and the bedside (i2b2). We evaluate it using data research mono and multi-criteria and then calculate the precision of our model. The result shows that, the coding data warehouse provides with good accuracy, an association of diagnostic code and medical act closer the patient's clinical landscape. Doctors without knowledge of coding rules could use this information to optimize and improve the diagnostic coding.
Introduction: Since the late 1990s, research and administrative institutions have been developing health data warehouses and increasingly reusing claims data. The impact of these changes is not yet completely quantified. Our objective was to compare the change in the number of patients included per study between observational and interventional studies over a 20-year period starting in 1995.
Materials and methods: We extracted all abstracts from studies published in three leading medical journals over the period 1995–2014 (18,107 studies). Then, we divided our study into two steps. First, we constructed an SVM-based predictive model to categorize each abstract into “observational”, “interventional” or “other” studies. In a second step, we built an algorithm based on regular expressions to automatically extract the number of included patients.
Results: During the investigated period, the median number of enrolled patients per study increased for interventional studies, from 282 in 1995–1999 to 629 in 2010–2014. In the same time, the median number of patients increased more for observational studies, from 368 in 1995–1999 to 2078 in 2010–2014.
Discussion: The routine storage of an increasing amount of data (from data warehouses or claims data) has had an impact in recent years on the number of patients included in observational studies. The recent development of “randomized registry trials” combining, on the one hand, an intervention and, on the other hand, the identification of the outcome through data reuse, may also have an impact, over the next decade, on the number of patients included in randomized clinical trials.
Data taxonomy facilitates data valuation. The origin-based data taxonomy contains four types of data (provided, observed, derived and inferred) and 10 subcategories. In this paper, we report the results of multivocal literature around the origin-based data taxonomy. The review results are used to refine the definitions of the types of data when to figure out data valuation within health care. Furthermore, we exemplify how the types of data can be recognized in health care (e.g., patient medication, alerting about risk patients, patient logistics, remote monitoring) to realize data valuation based on the proposed data taxonomy around the types of data.
Common data models (CDM) have enabled the simultaneous analysis of disparate and large data sources. A literature review identified three relevant CDMs: The Observational Medical Outcomes Partnership (OMOP) was the most cited; next the Sentinel; and then the Patient Centered Outcomes Research Institute (PCORI). We tested these three CDMs with fifteen pre-defined criteria for a diabetes cohort study use case, assessing the benefit (good diabetes control), risk (hypoglycaemia) and cost effectiveness of recently licenced medications. We found all three CDMs have a useful role in planning collaborative research and enhance analysis of data cross jurisdiction. However, the number of pre-defined criteria achieved by these three CDMs varied. OMOP met 14/15, Sentinel 13/15, and PCORI 10/15. None met the privacy level we specified, and most of the other gaps were clinical and cost outcome related data.
Ontologies are an important big-data analytics tool. Historically code lists were created by domain experts and mapped between different coding systems. Ontologies allow us to develop better representations of clinical concepts, data and facilitate better data extracts from routine clinical data. It also makes the process of case identification and key outcome measures transparent. We describe a process we have operationalised in our research. We use ontologies to resolve the semantics of complex health care data. The use of the method is demonstrated through a pregnancy case identification method. Pregnancy data are recorded in different coding systems and stored in different general practice systems; and pregnancy has its own complexities in that not all pregnancies proceed to term, they have different lengths and involve multiple providers of health care.
There is a growing interest in identifying, weighing and accounting for the impact of health determinants that lie outside of the traditional healthcare system, yet there is a remarkable paucity of data and sources to sustain these efforts. Decision support systems would greatly benefit from leveraging models which are able to extend and use such cross-domain knowledge. This paper describes an approach to identify and explore related social and clinical terms based on large corpora of unstructured data. Using word embedding techniques on relevant sources of knowledge, we have identified terms that appear close together in the high-dimensional space. In particular, having created a model with cross-domain knowledge on the social determinants of health, we have been able to demonstrate that it is possible to surface terms in this domain when querying for related clinical terms, thereby creating a bridge between the social and clinical determinants of health. This is a promising approach with significant applicability in decision support efforts in healthcare.
The varied phenotypes of obstructive sleep apnea (OSA) poses critical challenges, resulting in missed or delayed diagnosis. In this work, we applied k-modes, aiming to identify groups of OSA patients, based on demographic, physical examination, clinical history, and comorbidities characterization variables (n = 41) collected from 318 patients. Missing values were imputed with k-nearest neighbours (k-NN) and chi-square test was held. Thirteen variables were inserted in cluster analysis, resulting in three clusters. Cluster 1 were middle-aged men, while Cluster 3 were the oldest men and Cluster 2 mainly middle-aged women. Cluster 3 weighted the most, whereas Cluster 1 weighted the least. The same effect was described in increased neck circumference. The percentages of variables driving sleepiness, congestive heart failure, arrhythmias and pulmonary hypertension were very low (<20%) and OSA severity was more common in mild level. Our results suggest that it is possible to phenotype OSA patients in an objective way, as also, different (although not considered innovative) visualizations improve the recognition of this common sleep pathology.
African American children are more than twice as likely as white American children to die after surgery, and have increased risk for longer hospital stays, post-surgical complications, and higher hospital costs. Prior research into disparities in pediatric surgery outcomes has not considered interactions between patient-level Clinical Risk Factors (CRFs) and population-level Social, Economic, and Environmental Factors (SEEFs) primarily due to the lack of integrated data sets. In this study, we analyze correlations between SEEFs and CRFs and correlations between CRFs and surgery outcomes. We used a dataset from a cohort of 460 surgical cases who underwent surgery at a children's hospital in Memphis, Tennessee in the United States. The analysis was conducted on 23 CRFs, 9 surgery outcomes, and 10 SEEFs and demographic variables. Our results show that population-level SEEFs are significantly associated with both patient-level CRFs and surgery outcomes. These findings may be important in the improved understanding of health disparities in pediatric surgery outcomes.
Parenteral nutrition represents a well-established but highly sensitive process associated with several patient conditions. It typically involves assessments of many parameters and a wide range of observations in order to come up with the best possible parenteral solution for a patient. Different calculation tables are used to determine correct ratios of nutritional elements which would later be administered. This work focuses on providing a process map for parenteral nutrition in children using the combination of Petri nets and openEHR methodology to create an overview for the decision-making process.
Objective: Daily assessment of the acid-base balance (ABB) in blood is one of the important elements of multi-parameter patient monitoring at intensive care units (ICUs). The present work aims to determine the effectiveness and validity of the integral homeostasis index IHx calculated from ABB blood test data for the assessment and prognosis of children with critical traumatic conditions.
Methods: 345 patients were studied. IHx was calculated and the data were subjected to statistical evaluation. An Arden-Syntax-based clinical decision support (CDS) platform was used. One purpose of the study was to incorporate the platform into the ICU IT landscape of the hospital, and the second purpose was to develop a CDS module for the calculation of IHx and present the results in real time to the attending physician.
Results: Integral homeostasis index IHx calculations as well as their prompt assessment permit better and more rapid treatment of children with severe traumatic injury.
There are many drug databases, but sometimes the data quality may result in wrong medication for patients. Results that it is very important to provide a good quality drug information, supply structured information and build useful relations between the drugs related information and the patient status in terms of particularities. Children are the most sensitive to drug dosage or certain substances that is why pediatrics was our first choice for the research. To support this, we propose an ontology starting from on-line drug prospectuses. We start extracting the prospectuses information from the web pages, investigate the sections of the prospectuses structure (indications, contraindications, dosage, etc.) and use the information in an ontology that we integrate into a pediatric application. In the background of the application this solution provides the correct matching between the patient and the treatment, extracting for the physician only the best prescription options for the current case. The application allows the physician to select the appropriate drug and decide the best treatment in terms of correct substances and dosage for a certain child. We use the prospectuses solution because the application is in Romanian and other resources are not very well provided. For the future the solution will be adapted to other languages and other databases. This model can be generalized in different languages. The application is improved with the new ontology module and helps physicians to give a good treatment considering all relationships, constraints or antagonistic situations that may occur in providing a treatment.
Introduction: Due to high volumes of data routinely recorded through Hospital Information System, data reuse has become important in recent years. A data warehouse was developed in the Lille University Hospital to reuse anesthesia data. At the moment, it is mainly used for clinical research, by offering extraction of data tables to answer a clinical question. In this article, we try to identify other contexts of data reuse than the one currently provided by the data warehouse, in comparison with those in the literature.
Material and methods: A semi-structured interview grid was designed so that to tackle respondents' experience with clinical data reuse, the various contexts in which the data are reused along with the information systems they currently use to proceed and the difficulties they encounter. A semi-inductive thematic analysis process was performed to identify meaningful semantic units and grouped them into thematic categories.
Results: Ten anesthetists were interviewed; three main contexts emerge: research and knowledge discovery, evaluation of professional practices and organizational management. Data are accessed through complicated administrative procedures and clinicians have to perform tasks beyond their competencies.
Discussion: Difficulties encountered when searching for data express the need for easy and continuous access to data.
Digestive endoscopies, along with all medical procedures in France are coded with the CCAM. This task is done by the physicians, is time-consuming and requires a good knowledge of the terminology besides a medical knowledge. This method offers an automatic coding of endoscopic procedures from free-text reports. Thanks to a supervised learning method, the reports are coded with an average precision and recall of 0.92 on a 1639 texts corpus.