Ebook: German Medical Data Sciences: Shaping Change – Creative Solutions for Innovative Medicine
Healthcare systems have been in a state of flux for a number of years now due to increasing digitalization. Medicine itself is also facing new challenges, and how to maximize the possibilities of artificial intelligence, whether digitalization can help to strengthen patient orientation, and dealing with the issue of data quality and completeness are all issues which require attention, creativity and research.
This book presents the proceedings of the 64th annual conference of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS 2019), held in Dortmund, Germany, from 8 - 11 September 2019. The theme of this year’s conference is Shaping Change – Creative Solutions for Innovative Medicine, and the papers presented here focus on active participation in shaping change while ensuring that good scientific practice, evidence and regulation are not lost as a result of innovation. The book is divided into 8 sections: biostatistics; healthcare IT; interoperability - standards, classification, terminology; knowledge engineering and decision support; medical bioinformatics and systems biology; patient centered care; research infrastructure; and sociotechnical systems / usability and evaluation of healthcare IT.
The book will be of interest to all those facing the challenges posed by the ongoing revolution in medicine and healthcare.
This year 263 contributions were submitted for the GMDS Annual Meeting, including four full papers for the MIBE and 69 for a publication in the GMDS series in Studies in Health Technologies and Informatics, as well as 192 abstracts. All submissions were reviewed in a two-stage peer review process. A total of 760, in some cases very extensive, reviews were carried out by 280 experts. In a meta-review, the preliminary results were communicated to the authors with information on the revision of the contributions. After the revision, a second, shortened metareview took place. A total of one contribution to publication in MIBE and 38 contributions to publication in the GMDS Series was accepted. A detailed presentation of the results of the peer review process is shown in Figure 1.
Figure 1. Illustration of the contributions submitted and the results of the peer review process, broken down by the various types of publication.
We would like to thank all authors for submitting the mostly very good contributions. We would especially like to thank all the reviewers: With their work and the sometimes very detailed reviews and critical-constructive references they have made a significant contribution to the quality of these proceedings. All reviewers are listed at the end of the prefaces.
This year, for the first time, the review procedure was carried out in accordance with the new publication regulations. This required and requires various changes in the “Online Registry” submission system. We know what it means to integrate changes into running software systems and would like to thank webtek.at and especially Mr. Thomas and Claus Schabetsberger for their support. The changes in the review process also led to disruptions and confusion at various points. We apologize to the authors and reviewers for any inconvenience.
Last but not least: We would like to pick up on the President’s greeting that submissions of full paper for the GMDS annual meeting are now a tradition: We see this as a mission to preserve and expand on what we have achieved. In the last two years 80 full papers were published in the GMDS proceedings by IOS Press, which were cited 44 times (0.55 per item). [Source Clarivate, as of Aug 1st 2019.]
However, most of the submissions and the accepted papers are still from the field of medical informatics (25 (65%) of 38 submission). Unfortunately, the new offer has not yet been properly accepted by the other departments. Not only are fewer contributions submitted in absolute terms, but the proportion of enrichments for Stud Health Technol Inform is also lower than in Medical Informatics (Table 1).
Table 1. Contributions accepted for publication in Stud Health Technol Inform broken down by specialty. The percentages indicate how many of the accepted contributions are published in Stud Health Technol Inform.
Specialty Accepted GMDS in IOS Press Accepted Over all
Epidemiology 0 0% 32
Medical Bioinformatics and Systems Biology 2 20% 10
Medical Biometry 1 3% 35
Medical Documentation 1 11% 9
Medical Informatics 25 26% 96
Interdisciplinary 9 25% 36
total 38 17% 218
We see it as our task not only to maintain the status we have achieved, but also to make this form of publication of congress contributions attractive for the other specialist areas – so that we can keep the promise made in the title: To give an overview of current research in the German Medical Data Sciences.
Alfred Winter (MIBE)
Differential item functioning (DIF) indicates differential response probabilities of items for different subgroups. While there is a vast amount of research and literature on DIF in the field of educational screening and career assessment, DIF analysis has hardly been applied in the field of clinical assessment. This paper aims at analyzing the presence of gender related DIF in a cross-sectional survey of children assessed by a structured questionnaire containing items on attention deficit and hyperactivity. A total of 1449 children (mean age: 1.94 ± 0.14 years; 51.2% male) were included. Almost no significant variations in parameters were found between boys and girls. Results based on a Partial Credit Model indicate an absence of DIF in eight out of nine items. Consistent with other studies in attention deficit hyperactivity disorder (ADHD) our results imply that the same level of rating for a symptom has the same meaning for boys and girls.
Radiology has a reputation for having a high affinity to innovation – particularly with regard to information technologies. Designed for supporting the peculiarities of radiological diagnostic workflows, Radiology Information Systems (RIS) and Picture Archiving and Communication Systems (PACS) developed into widely used information systems in hospitals and form the basis for advancing the field towards automated image diagnostics. RIS and PACS can thus serve as meaningful indicators of how quickly IT innovations diffuse in secondary care settings – an issue that requires increased attention in research and health policy in the light of increasingly fast innovation cycles. We therefore conducted a retrospective longitudinal observational study to research the diffusion dynamics of RIS and PACS in German hospitals between 2005 and 2017. Based upon data points collected within the “IT Report Healthcare” and building on Rogers’ Diffusion of Innovation (DOI) theory, we applied a novel methodological technique by fitting Bayesian Bass Diffusion Models on past adoption rates. The Bass models showed acceptable goodness of fit to the data and the results indicated similar growth rates of RIS and PACS implementations and suggest that market saturation is almost reached. Adoption rates of PACS showed a slightly higher coefficient of imitation (q = 0.25) compared to RIS (q = 0.11). However, the diffusion process expands over approximately two decades for both systems which points at the need for further research into how innovation diffusion can be accelerated effectively. Furthermore, the Bayesian approach to Bass modelling showed to have several advantages over the classical frequentists approaches and should encourage adoption and diffusion research to adapt similar techniques.
In the context of the German Medical Informatics Initiative (MII), where data reuse and data sharing are major goals, cross-site, long-term research on patient care data can only be conducted lawfully with informed patient consent. Thus, the MII consent working group developed a template form for patient information and broad consent based on work that has been done for a former biobank project. The broad consent enables the patient to consent to the use of a wide range of the documented data including research purposes. Therefore, a user-friendly tool is needed which not only supports the storage and maintenance of the patient’s consents but also allows him to easily review or withdraw his consents. Furthermore, the tool should allow the patient to review the use of his data in research projects and possible publications. This is why we developed a concept of how such a tool could be integrated into the clinical and research system landscape and implemented a prototype as a proof of concept.
The alpine space is challenging for mobile care organizations as rural homes of patients are often characterized through long way distances or might be sometimes even isolated due to weather conditions. Real-time monitoring features for supporting mobile care require the easy conduction of self-measurements on vital signs for patients. Therefore, a vital sign telemonitoring system got conceptualized, utilizing the potential of Information and Communication Technology (ICT). The aim of this work was to gather technical and user-related requirements for a patient-centered telemonitoring system. Therefore, a mixed approach was followed comprising a comprehensive technical review, a literature review and interviews with stakeholders. Suitable use cases were derived from the gathered technical and user-related requirements. The results yielded to a concept for a seamless integrated, unobtrusive home monitoring system for elderly people with real-time data synchronization and communication features to support the mobile nurse organization, which got implemented and evaluated in the field. The concept overcomes known barriers of usability on telemonitoring systems like complex interaction which might lead to more efficiency and effectiveness in mobile nursing. The developed concept got further implemented as a prototype and validated within a 3-month test period.
Registries are a widely accepted method in health services research. Registry owners are faced with the challenge to document and assure data quality, vital for answering research questions and conducting quality research. Therefore a survey on indicators for data quality was conducted as part of a German funding initiative. A list of 51 pre-defined quality indicators was provided to 16 patient registry projects in a web based survey. The assessment included three criteria derived from the Rand Appropriateness Method (RAM), the application area, and three criteria representing a project-specific perspective. Considering the criteria adapted from RAM, a core set of 17 indicators could be identified. This core set covered important dimensions, such as case completeness, data completeness and validity. Adding importance as a criterion from a project-specific perspective led to a subset of six indicators. The selection of indicators identified through this survey may be applied on different use cases, e.g. a) benchmarking between registries, b) benchmarking of study sites, and c) value-based remuneration of study sites. Thus, the presented core set of indicators can be used as a basis to improve quality of registry data with a systematic approach.
The Clinical Quality Language (CQL) is a useful tool for defining search requests for data stores containing FHIR data. Unfortunately, there are only few execution engines that are able to evaluate CQL queries. As FHIR data represents a graph structure, the authors pursue the approach of storing all data contained in a FHIR server in the graph database Neo4J and to translate CQL queries into Neo4J’s query language Cypher. The query results returned by the graph database are retranslated into their FHIR representation and returned to the querying user. The approach has been positively tested on publicly available FHIR servers with a handcrafted set of example CQL queries.
Fast Healthcare Interoperability Resources (FHIR), an international standard for exchanging digital health data, is increasingly used in health information technology. FHIR promises to facilitate the use of electronic health records (EHRs), enable mobile technologies and make health data accessible to large-scale analytics. Until now, there is no comprehensive review of scientific articles about FHIR and its use in digital health. Here, we aim to address this gap and provide an overview of the main topics associated with FHIR in the scientific literature. For this, we screened all articles about FHIR on Web of Science and PubMed and identified the main topics discussed in these articles. We also explored the temporal trend and geography of publications and performed some basic text mining on article abstracts. We found that the topics most commonly discussed in the articles were related to data models, mobile and web applications as well as medical devices. Since its introduction, the number of publications about FHIR have steadily increased until 2017, indicating an increasing popularity of FHIR in healthcare (in 2018, publication numbers remained stable). In sum, our study provides an overview of the scientific literature about FHIR and its current use in digital health.
The Logical Observation Identifiers, Names and Codes (LOINC) is a common terminology used for standardizing laboratory terms. Within the consortium of the HiGHmed project, LOINC is one of the central terminologies used for health data sharing across all university sites. Therefore, linking the LOINC codes to the site-specific tests and measures is one crucial step to reach this goal. In this work we report our ongoing efforts in implementing LOINC to our laboratory information system and research infrastructure, as well as our challenges and the lessons learned. 407 local terms could be mapped to 376 LOINC codes of which 209 are already available to routine laboratory data. In our experience, mapping of local terms to LOINC is a widely manual and time consuming process for reasons of language and expert knowledge of local laboratory procedures.
Data integration is the problem of combining data residing at different sources and providing the user with a unified view of these data. In medical informatics, such a unified view enables retrospective analyses based on more facts and prospective recruitment of more patients than any single data collection by itself. The technical part of data integration is based on rules interpreted by software. These rules define how to perform the translation of source database schemata into the target database schema. Translation rules are formulated by data managers who usually do not have the knowledge about meaning and acquisition methods of the data they handle. The professionals (data providers) collecting the source data who have the respective knowledge again usually have no sufficient technical background. Since data providers are neither able to formulate the transformation rules themselves nor able to validate them, the whole process is fault-prone. Additionally, in continuous development and maintenance of (meta-) data repositories, data structures underlie changes, which may lead to outdated transformation rules. We did not find any technical solution, which enables data providers to formulate transformation rules themselves or which provides an understandable reflection of given rules. Our approach is to enable data providers understand the rules regarding their own data by presenting rules and available context visually. Context information is fetched from a metadata repository. In this paper, we propose a software tool that builds on existing data integration infrastructures. The tool provides a visually supported validation routine for data integration rules. In a first step towards its evaluation, we implement the tool into the DZL data integration process and verify the correct presentation of transformation rules.
The utilisation of metadata repositories increasingly promotes secondary use of routinely collected data. However, this has not yet solved the problem of data exchange across organisational boundaries. The local description of a metadata set must also be exchangeable for flawless data exchange. In previous work, a metadata exchange language QL4MDR was developed. This work aimed to examine the applicability of this exchange language. For this purpose, existing MDR implementations were identified and systematically inspected and roughly divided into two categories to distinguish between data integration and query integration. It has been shown that all the implementations can be adapted to QL4MDR. The integration of metadata is an important first step; it enables the exchange of information, which is so urgently needed for the further processing of instance data, from the metadata mappings to the transformation rules.
Disease management programs coordinate and manage treatment between physicians and across sectors of the healthcare system. The aim is to reduce existing care deficits (overuse, underuse and misuse) and thus improve the quality and cost-effectiveness of care. To facilitate the treatment of chronic diseases such as asthma, it is important to continuously document a patient’s medical history. For this purpose, it is necessary to be able to integrate and exchange data from and between multiple different information systems. Aiming to ensure interoperability across electronic documentation systems, this paper proposes the standardization of the KBV’s (National Association of Statutory Health Insurance in Germany) specification for the electronic Disease Management Program (eDMP) for bronchial asthma. Therefore, international standards like SNOMED CT, LOINC and UCUM were chosen to encode clinical information, while evaluating their suitability with the scoring system ISO/PRF TR 21564. The resulting analysis showed that most of the terms had either a complete or partial equivalent term in one of the terminology systems. Therefore, future implementations of the eDMP for bronchial asthma that utilize standard terminologies could benefit from data integration from different sources like electronic health records and reduce redundancies in data capture and storage.
Interoperability is a growing demand in healthcare, caused by heterogeneous sources, which aggravate information transfer. The interoperability issues can be addressed by metadata repositories. These support to ensure syntactical interoperability, like compatible data formats or value ranges, however especially semantic interoperability is still challenging. Semantic annotation through standardized terminologies and classifications enables to foster semantic interoperability. This work aims to interconnect Samply.MDR and Portal of Medical Data Model (MDM-Portal) to allow facilitated semantic annotation with UMLS. Therefore, Samply.MDR was extended to store semantic information. While creating a data element, a request to MDM is send, which results in possible UMLS codes. The user can now adopt the most suitable code and select a link type between the code and the element itself. A successful enrichment of data elements with UMLS codes was shown by interconnecting Samply.MDR and MDM-Portal.
Although HL7 v2.x has been in use for more than 25 years and is thus probably the most widely used data exchange standard in the healthcare domain, there are still ongoing discussions about technical terminology, i.e., how exactly individual specifications are to be interpreted. This has led to the idea of modernizing the specification/standard without requiring any change in implementations. In other words, formalize the standard using words that are easier to understand. In parallel, the coexistence with the upcoming FHIR standard triggers modifications within HL7 v2.x. This paper explores the interaction between the modernization of HL7 v2.x and the convergence with FHIR.
One of the major obstacles for research on German medical reports is the lack of de-identified medical corpora. Previous de-identification tasks focused on non-German medical texts, which raised the demand for an in-depth evaluation of de-identification methods on German medical texts. Because of remarkable advancements in natural language processing using supervised machine learning methods on limited training data, we evaluated them for the first time on German medical reports using our annotated data set consisting of 113 medical reports from the cardiology domain. We applied state-of-the-art deep learning methods using pre-trained models as input to a bidirectional LSTM network and well-established conditional random fields for de-identification of German medical reports. We performed an extensive evaluation for de-identification and multiclass named entity recognition. Using rule based and out of domain machine learning methods as a baseline, the conditional random field improved F2-score from 70 to 93% for de-identification, the neural approach reached 96% in F2-score while keeping balanced precision and recall rates. These results show, that state-of-the-art machine learning methods can play a crucial role in de-identification of German medical reports.
In the life science domain, experts are usually familiar with spreadsheet software and often use it in their daily work to collect and structure required domain knowledge. The processing and analysis of spreadsheet data is an important task that must be supported by efficient software solutions. A typical application scenario is for example an integration of spreadsheet data (specified or derived) in an ontology to provide reasoning or search. Different converter tools were developed to support a spreadsheet-to-ontology transformation. Such tools allow often only a relatively simple structure of the spreadsheet template or they require complex mapping processes to map the spreadsheet and ontological entities. In many cases, it is impossible to use the existing converter tools because the spreadsheet data must be processed first before the derived data can be integrated into the ontology. In such cases, an efficient and fast development of customized software solutions is of great importance. In this paper, we present a general spreadsheet processing framework to efficiently read and write spreadsheet data. The Spreadsheet Model Generator (SMOG) provides a simple mechanism to automatically generate the Java object model and mappings between object code and spreadsheet entities. Our solution greatly simplifies the implementation of spreadsheet access methods and enables an efficient development of spreadsheet-based applications. The SMOG has already been used successfully in several biomedical projects under real world conditions.
Computerized guidelines have been utilized for several decades by now. Systems based on computerized-guidelines often intertwine (1) medical knowledge representation, (2) guideline procedures and (3) hospital workflows. This induces several drawbacks. Most prominent problems include non-shareability of the computerized guideline between hospitals, limited accessibility of the computerized guideline for humans, and an unclear, often confusing combination of hospital-specific workflow and guideline-induced control flows. This article proposes a 3-layer modelling approach strictly distinguishing the aforementioned three aspects to overcome the respective problems. We applied the 3-layer approach to the implementation of a guideline-interpreting software module in the context of the Medical Informatics Initiative Germany (here: SMITH Project) and comment on the resulting implications for the software design of that module.
Magnetic Resonance Fingerprinting (MRF) is an imaging technique acquiring unique time signals for different tissues. Although the acquisition is highly accelerated, the reconstruction time remains a problem, as the state-of-the-art template matching compares every signal with a set of possible signals. To overcome this limitation, deep learning based approaches, e.g. Convolutional Neural Networks (CNNs) have been proposed. In this work, we investigate the applicability of Recurrent Neural Networks (RNNs) for this reconstruction problem, as the signals are correlated in time. Compared to previous methods based on CNNs, RNN models yield significantly improved results using in-vivo data.
We developed a tool based on the KNIME analytics platform for the extraction and visualisation of medical time series stored in the Medical Information Mart for Intensive Care III (MIMIC III) and the related MIMIC-III Waveform Database Matched Subset. The large number of data points and the free accessibility make these data sets an attractive source for data-driven projects in the medical domain. The problem that we tackled with our tool was the lack of an easy and extensible way of selecting, reading, and visualising stored time series. Especially the fact that medical data science projects are often conducted by interdisciplinary teams called for a software solution that can be utilised by medical practitioners without programming experiences and that still offers enough flexibility for data scientists.
Translational research in the medical sector is dependent on clear communication between all participants. Visualization helps to represent data from different sources in a comprehensible way across disciplines. Existing tools for clinical data management are usually monolithic and technically challenging to set up, others require a transformation into specific data models while providing mostly non-interactive visualizations or being specialized to very particular use cases. Statistical programming languages (R, Julia) on the other hand offer great flexibility in data analytics, but are harder to access for clinicians with little to no programming expertise. Our software, the Medical Data Explorer (MedEx), aims to fill this gap as light-weight, intuitive, web-based solution with simple data import routes. We couple a modern dynamic web interface with an in-memory database solution for near real-time responsiveness. MedEx provides multiple visualization options (Scatterplot, correlation heatmap, bar chart, grouped boxplot, grouped histogram, coplot) to get an easy overview on the loaded data as well as to perform pattern discovery and elementary statistics. We demonstrate the utility of MedEx, by example, on data from the cardiology research warehouse of Heidelberg University Hospital. In summary, our tool empowers clinicians to conduct their own interactive exploratory data analysis.
Informal caregivers often complain about missing knowledge. A knowledge-based personalized educational system is developed, which provides caregiving relatives with the information needed. Yet, evaluation against domain experts indicated, that parts of the knowledge-base are incorrect. To overcome these problems the system can be extended by a learning capacity and then be trained further utilizing feedback from real informal caregivers. To extend the existing system an artificial neural network was trained to represent a large part of the knowledge-based approach. This paper describes the found artificial neural network’s structure and the training process. The found neural network structure is not deep but very wide. The training terminated after 374.700 epochs with a mean squared error of 7.731 ∗ 10−8 for the end validation set. The neural network represents the parts of the knowledge-based approach and can now be retrained with user feedback, which will be collected during a system test in April and May 2019.