
Ebook: German Medical Data Sciences: A Learning Healthcare System

Advances in digital and information technology have meant that medical informatics and its associated fields are of ever-increasing importance in the modern healthcare environment.
This book presents selected papers from the 63rd annual conference of the German Society of Medical Information Sciences, Biometry, and Epidemiology, GMDS 2018, held in Osnabrück, Germany, in September 2018. The society encompasses not only medical informatics, biometry and epidemiology, but also medical bioinformatics, systems biology and health data management. The title of this year’s conference is “The Learning Health System: Research Based, Innovative, Connecting”, and 38 full papers of the 164 oral presentations and 65 posters delivered at the conference are included here. A wide variety of scientific topics are covered, including standards to enable the interoperable interchange of information; metadata management; record linkage; IT issues for health care networks; interprofessional teaching and training; eHealth legislation; analysis of miRNAs and RNA-Seq data, among others.
The contributors are all specialists in their field, and this book disseminates some of the innovative ideas which are urgently needed to meet the challenges facing a constantly developing digital healthcare environment.
The 63rd Annual Conference of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS) builds upon the motto “The Learning Health System: research based, innovative, connecting” and herewith highlights the unique combination of disciplines under the roof of GMDS and their contribution to a Learning Health System (LHS): medical informatics, biometry, epidemiology, medical bioinformatics, systems biology and health data management. This volume witnesses the breadth of these contributions and demonstrates their input to the learning circle of an LHS from data to knowledge, knowledge to performance and finally performance to data
Friedman CP, Rubin JC, Sullivan KJ. Toward an Information Infrastructure for Global Health Improvement. Yearb Med Inform. 2017;26:16–23.
In light of an LHSit is clear that digitised healthcare is not an aim in itself but a must and an instrument to ultimately serve the patient.
Yet, there is still so much to learn about a Learning Health System. Exchange with colleagues from within Europe and from abroad is essential to obtain new insights. This idea guided our motivation to invite dedicated international guests to the conference to give keynote speeches and introductions to workshops. It also guided our intention to carry on the undertaking of an open call for full papers in English, a two stage peer review process and the publication of accepted papers in the Series Studies in Health Technology and Informatics, which was initially established for GMDS2017 in Oldenburg. It allows the authors to present their achievements to an international audience, also beyond the GMDS Annual Conference. In addition, full papers in German are published in the open access journal German Medical Science – Medical Informatics, Biometry and Epidemiology (MIBE) and abstracts in German or English by German Medical Science via www.egms.de.
This volume embraces 38 full papers out of the total of 164 oral presentations and 65 posters of this year's GMDS conference. The papers are sorted alphabetically by the topic of the session they were presented in.
We wish to thank all colleagues who submitted their work to the conference and particularly those who submitted full papers. We are grateful to the many reviewers who dedicated their time and knowledge to make this congress possible.
GMDS2018 takes place in Osnabrück on the site of the HealthCampus, an institution founded by the two local universities that brings together major regional hospitals and health and social care institutions to form an emerging Learning Health System. It is meant to build up trust and stimulate the dialogue between science and practice.
With this in mind, GMDS2018 adds another building block to establishing LHS inspired virtuous circles of learning.
Ursula Hübner
Ulrich Sax
Hans-Ulrich Prokosch
Bernhard Breil
Harald Binder
Antonia Zapf
Brigitte Strahwald
Tim Beißbarth
Niels Grabe
Anke Schöler
Clinical trials are the foundation of evidence-based medicine and their computerized support has been a recurring theme in medical informatics. One challenging aspect is the representation of eligibility criteria in a machine-readable format to automate the identification of suitable participants. In this study, we investigate the capabilities for expressing trial eligibility criteria via the search functionality specified in HL7 FHIR, an emerging standard for exchanging healthcare information electronically which also defines a set of operations for searching for health record data. Using a randomly sampled subset of 303 eligibility criteria from ClinicalTrials.gov yielded a 34 % success rate in representing them using the FHIR search semantics. While limitations are present, the FHIR search semantics are a viable tool for supporting preliminary trial eligibility assessment.
The use of mobile devices for a house to house interview-administered survey data collection is becoming a practice along with the paper-based data collection tools. Though the electronic data capture mechanism is supposed to improve the efficiency of the data collection mechanism and the quality of the data, there is limited evidence on the cost-effectiveness of the technologies. This project aims to develop an online pre-implementation survey cost estimator to support the planning and decision of implementing agency. Scalable costs with sample size were estimated using parametric cost estimating technique. In this article, we used Introduction, State of the art, Concept, Implementation, Lessons Learned (ISCIL) format to present the overall development process of this online cost estimator.
A low level of patient health literacy represents a major reason for worse prognosis or reduced therapy adherence. Health information booklets are a major tool for improving patient's health literacy. This paper presents a computer-based readability analysis of patient information booklets from the cardiovascular domain. The study relies on 34 English booklets mostly on heart disease, prevention and procedures. It compares five different, well-established readability instruments. On average, readers of the assessed booklets have to visit school at least until the 9th U.S. school grade when applying the Flesch-Kincaid formula. According to the Gunning-Fog metric, readers would have to attend school until the 11th grade. The presented study demonstrates the feasibility of a fully automated text processing tool-chain for patient information booklets. The results reveal that readability metrics should be carefully interpreted and only be interchanged with caution.
To improve data quality, clinical cohort studies require ongoing centralized data monitoring. Results of data quality monitoring are often reported in a general format, without individual feedback for different types of users within the research consortium. Several R packages provide the possibility for an easy to use approach to analyze and communicate information in interactive web-based forms. This article describes a pilot, which is applied in a German infectious disease cohort using interactive feedback and gamification to report data quality, to ultimately increase the quality of research outcomes.
Secondary use of healthcare data is dependent on the availability of provenance data for assessing its quality, reliability or trustworthiness. Usually, instance-level data that might be communicated by HL7 interfaces entail limited metadata about involved software systems, persons or organizations bearing responsibility for those systems. This paper proposes a strategy for capturing interoperable provenance data needed by data stewards for assessing healthcare data that are reused in a research context. Aimed at a realistic level of granularity even system-level metadata will support a data steward trying to trace the origins or provenance of healthcare data that have been transferred to the research context. Those metadata are extracted from the 3LGM2-system, used for modelling hospital information systems. Based on the W3C provenance specification interrelated activities, entities and agents can be integrated and stored in RDF triple stores and therefore queried and visualized.
Annotation with semantic codes helps to overcome interoperability issues for electronic documentation in medicine. However, the manual annotation process is laborious and semantic codes are ambiguous. We developed a publicly accessible web service to alleviate these drawbacks with a sophisticated and fast search mechanism based on more than 330,000 semantic code suggestions. These suggestions are derived from semantically annotated data elements contained in the Portal of Medical Data Models manually curated by medical professionals. Integrating this suggestion service can support the manual annotation process and strengthen uniform coding. Integration is demonstrated for two separate data model editors. Usage statistics show over 5,500 suggestion requests per month for semantic annotation of items. The web service may also prove helpful for automatic semantic coding.
Metadata management is an important task in medical informatics and highly affects the gain out of existing health information data. Data Warehouse solutions like Informatics for Integrating Biology and the Bedside (i2b2) are common tools for identifying patient cohorts and analyzing collected clinical data while respecting patient privacy. The Resource Description Framework (RDF) is designed for highly interoperable ontology representation in various formats, facilitating ontology and metadata management. Our approach is to combine i2b2's and RDF's benefits by importing the easy-to-edit RDF ontology into the extensive-research-enabling i2b2 software. We do so by using a SPARQL Protocol and RDF Query Language (SPARQL) interface, that enables RDF data queries, and developing a java program, which then generates i2b2-specific SQL insert statements. To demonstrate our solution's feasibility, we transcribe our lung disease specific ontology to RDF and import it into our i2b2 data warehouse.
Whenever medical data is integrated from multiple sources, it is regarded good practice to separate data from information about its meaning, such as designations, definitions or permissible values (in short: metadata). However, the ways in which applications work with metadata are imperfect: Many applications do not support fetching metadata from externalized sources such as metadata repositories. In order to display human-readable metadata in any application, we propose not to change the application, but to provide a library that makes a change to the user interface. The goal of this work is to provide a way to “inject” the meaning of metadata keys into the web-based frontend of an application to make it “metadata aware”.
Collaboration in medical research is becoming common, especially for collecting relevant cases across institutional boundaries. If the data, which is usually very heterogeneously formalized and structured, can be integrated, such a collaboration can facilitate research. An absolute prerequisite for this is an extensive description about the formalization and exact meaning of every data element contained in a dataset. This information is commonly known as metadata. Various research networking projects tackle this challenge with the development of concepts and IT tools. The Samply Metadata Repository (Samply.MDR) is a solution for managing and publishing such metadata in a standardized and reusable way. In this article we present the structure and features of the Samply.MDR as well as its flexible usability by giving an overview about its application in various projects.
To exchange data across several sites or to interpret it at a later point in time, it is necessary to create a general understanding of the data. As a standard practice, this understanding is achieved through metadata. These metadata are usually stored in relational databases, so-called metadata repositories (MDR). Typical functions of such an MDR include pure storage, administration and other specific metadata functionalities such as finding relations among data elements. This results in a multitude of connections between the data elements, which can be described as highly interconnected graphs. To use alternative databases such as graph databases for modelling and visualisation it has already been proven to be beneficial in previous studies. The objective of this work is to evaluate how on-board techniques rely on matching and mapping using a graph database. Different datasets relating to cancer were entered, and algorithms for metadata management were applied.
Metadata Repositories (MDR) are databases for data elements that can be utilized in research as well as in medical care. These data elements are not the actual patient data (facts), but a complete definition of the variables or characteristics used, including coding, unit of measurement, data type and other aspects. The aim of the project described here was to evaluate possible application scenarios for MDRs by a larger group of experts. The focus was not on specific software, but on the community's basic expectation of such a database of data elements. To achieve this goal, a questionnaire was designed that contained questions on general aspects of setting up a registry for data elements in biomedical research as well as more specific points with regard to necessary functionalities, desired contents, tools for community work and the quality of data elements. One of the main results was that the users attach more importance to the quality of the content than to the efficiency in implementing their documentation concepts. At the same time, they consider the effort involved in using existing software systems to be too much compared with the benefits and have concerns about the use of their designs by third parties.
The German Centre for Lung Research (DZL) is an association of Germany's leading university and non-university institutions dedicated to lung research. Institutes and disease areas within the DZL manage their own data in several databases and registers using different software tools. Aim of our data integration effort is to provide a single central data warehouse frontend, where all patient related data is combined and made accessible. A two-stage survey was used to determine the data collections suitable for data integration. Integration was performed via extract-transform-load (ETL) steps using custom software. Original software (e.g. eCRF) used by the data collections did not need any modifications. The survey yielded 68 data collections. Until Jan 2018, 20 collections were successfully integrated. 10 collections were withdrawn by their owners while the integration of 38 was delayed. Data discovery, the process of finding existing data collections in a large research network, proved to be the step most underestimated. From technical point of view, data integration proved to be of minor complexity in comparison to the effort required for harmonization/mapping of data elements and management of common terminology.
A record linkage algorithm tries to identify records which belong to the same individual. We analyze the matching behavior of an approach used in the E-PIX matching tool on the very limited attribute set of first name, last name, date of birth and sex. Our benchmark set contains almost 37,000 records from the Popgen biobank. We develop a model which allows us to predict the workload on clerical review for data sets growing up to a factor of 10 or even more, without the need for a data set of this size. Based on this model we show two parameter sets with comparable detection rate of true duplicates, but where only one of them scales well on growing data sets. Our model provides realistic example records for each predicted matching of an upscaled data set. Thus, it enables to identify the parameters which need to be adjusted in order to improve the quality of the matching candidates. We also show that unreviewed merging of records is prone to homonym errors on data sets with 200,000 records and the limited attribute set above, while the merged record pairs are obviously different in clerical review.
This paper examines the relevance of genetic pedigree data in the context of medical research platforms. By surveying currently available tools for visualization and analysis of this data type and by considering possible use cases that could make usage of the combination of singular patient data and pedigree data, the advantages of integrating the data type into a medical research platform were shown. In a practical work step, an integration procedure of pedigree data into tranSMART was created. Furthermore, a tool to analyze and visualize pedigree data in combination with other patient data was implemented into SmartR, a dynamic analysis tool inside of tranSMART. Finally, we address limitations and future development strategies of the tool.
Optical navigation systems help surgeons find their way through the complex anatomy of a patient. However, such systems are accident-sensitive, time-consuming and difficult to use because of their complicated technical requirements such as the setting of optical markers and their intraoperative registration. The BIOPASS project, therefore, provides an innovative localisation system for markerless navigation in endoscopic surgery to support medical decision making. This system comprises several machine learning classifiers to recognise anatomical structures visible in the endoscopic images. To verify the data provided by these classifiers and to alert medical staff about surgical risk situations, we developed a new ontology-based software called OntoSun. Our software improves the precision and the sustainable traceability of the classifiers' results and also provides warning messages that increase situational awareness during surgical interventions.
Linking information across databases fosters new research in the medical sciences. Recent European privacy regulations recommend encrypting personal identifiers used for linking. Bloom filter based methods are an increasingly popular Record Linkage method. However, basic Bloom filter encodings are prone to cryptographic attacks. Therefore, hardening methods against these attacks are required. In this paper, a new method for such a hardening method for Privacy-preserving Record Linkage (PPRL) technique is presented. By using a Markov chain-based language model of bigrams of identifiers during the encryption, protection against attacks is increased. Based on real-world mortality data, we compare unencrypted and state of the art PPRL methods with the results of the proposed hardening method.
The workflow-oriented dissemination of electronic patient data is a central goal of IT deployment in hospitals. Against this background, the present study examines two research questions: (1.) Are there differences in the availability of electronic patient data (AEPD) between different clinical workflows and data types and (2.) which structural and organizational factors determine AEPD? Based on a Germany wide hospital survey, AEPD was assessed along six clinical workflows. While AEPD was lowest for ward rounds, discharge showed the highest AEPD with pre- and post-surgery processes ranging in between. With regard to the data types analyzed, patient demographics and observation findings obtained the highest AEPD scores. Electrophysiological results, checklists and warnings were less common electronically and received lower AEPD scores. Multiple linear regression analysis resulted in a significant model that explained 34.4% of the variance of AEPD. Large hospitals and those with a professional information management, a high health IT related innovation culture and a nursing informatics officer possess higher AEPD scores and thus have better clinical information logistics mechanisms at their command.