Ebook: German Medical Data Sciences 2021: Digital Medicine: Recognize – Understand – Heal
Digitization offers great potential – especially in medicine. Cross-domain and cross-institutional linkage, big data, artificial intelligence and robotics can all help to improve research and care, but they also pose new challenges to all those involved.
This book presents the joint proceedings of the GMDS (German Medical Data Sciences) and TMF (its Technology, Methodology and Infrastructure platform), held entirely online from 26 – 30 September 2021 as a result of restrictions due to the Coronavirus pandemic. This joint event addresses the opportunities and risks of using new information technologies in medicine, as well as the resulting requirements for data protection, data security and ethics. Methodological challenges associated with the preparation, evaluation and interpretation of data volumes which constantly increase in type and scope in the course of digitization are also examined in detail.
The 25 papers included here are divided into 5 sections: editorials; artificial intelligence and clinical decision support systems (CDSS); data integration and interoperability; human computer interaction; and software systems and frameworks, and the topics covered are very diverse, ranging from disease detection using retinal imaging, through data management and sharing, to interactive web applications.
Providing an overview of regional research and developments in the field, the book will be of interest to all those working in health technology and medical informatics; researchers and practitioners alike.
This year, for the 5th time, full papers could be submitted for the annual meeting of the GMDS; an occasion to draw an interim conclusion. This is done in two editorials: The first [1] is a reflection on the goals and what has been achieved [1], the second [2] looks at the different publication options and strategies from the perspective of young scientists.
Last year, various technical and organizational difficulties led to the proceedings not being published until this year, rather than before the conference, so the position of Congress Secretary was created to make the review process work better in the future. Martin Sedlmayr has graciously accepted the appointment to this task. Through this permanent position, we hope to be able to further improve and stabilize the review process in future.
Publications in Stud HTI are frequently used for cumulative, publication-based dissertations. Here, there have been increasing inquiries about the publication process, acceptance rates, and citations. The consensus publication process is described in [1], as are the citation statistics for the first three volumes. The statistics on submissions and acceptance rates are described in this preface:
This year we received 206 contributions: 61 (30%) full papers (54 full papers for this volume of Studies in Health Technologies and Informatics (Stud HTI) and 7 for GMS Medical Informatics, Biometry and Epidemiology (MIBE)) and 145 (60%) abstracts (101 talks and 44 e-posters). The distribution by subject is shown in Table 1.
The contributions were assessed and commented by 231 reviewers in 573 reviews. A total of 25 (acceptance rate (ar) = 39%) manuscripts were accepted for publication in Stud HTI and 3 (ar = 43%) for MIBE. 25 (46%) of the 61 full paper submissions were accepted as abstracts (publication in egms proceedings), of which 3 (5%) were subsequently withdrawn by the authors. 120 (ar = 82%) of the abstract submissions were accepted (At the time of the editorial deadline for this preface, the review process for abstracts had not yet been finalized. Therefore, minor changes may still occur here.), 70 (48%) as Talk, 50 (34%) as e-poster, of which one paper was withdrawn by the authors after acceptance. An overview of the review process is shown in Fig. 1.
More than 50% of the submissions and more than 70% of the full papers this year related to medical informatics (Table 1). This should be taken as a mandate to promote the GMDS and the GMDS annual conference more strongly in the fields of biometry and epidemiology, as well as bioinformatics and systems medicine, and to thus strengthen the profile of the GMDS as an interdisciplinary medical-scientific society.
We would like to take this opportunity to thank all authors, reviewers, members of the technical committees and the joint GMDS-TMF organizing committee. Without your work and support, these conference proceedings could not have been produced.
We wish you an exciting conference and an inspiring reading of the proceedings.
(Editor in Chief) Rainer Röhrig
(Bioinformatics and Systems Medicine) Tim Beißbarth
(Editor in Chief MIBE) Petra Knaup-Gregori
(Epidemiology) Jochem König
(Medical Documentation) Claudia Ose
(Biometry) Geraldine Rauch
(Medical Informatics) Ulrich Sax
(Chair of SPC) Björn Schreiweis
(Congress Secretary) Martin Sedlmayr
References
[1] Röhrig, R., Hübner, U., & Sedlmayr, M. . German medical data sciences in studies in health technology and informatics – reflections on the 5th volume. In: Röhrig, R., Beißbarth, T., König, J., Ose, C., Rauch, G., Sax, U., Schreiweis, B., & Sedlmayr, M. (Eds.), German medical data sciences 2021: Digital medicine: Recognize – understand – heal: Proceedings of the Joint conference of the 66th Annual meeting of the German association of medical informatics, biometry, and epidemiology e.V. (gmds) and the 13th Annual meeting of the TMF – technology, methods, and infrastructure for networked medical research e.V. 2021 online in Kiel, Germany. Stud Health Technol Inform 283 (2021), 3–11. doi: 10.3233/SHTI210534. IOS Press, Amsterdam.
[2] Schreiweis, B., & Kock-Schopenhauer, A. K. . One conference, three proceedings – which papers should I submit and how? A publication strategy for young scientists regarding the GMDS annual conference and beyond (Editorial). In: Röhrig, R., Beißbarth, T., König, J., Ose, C., Rauch, G., Sax, U., Schreiweis, B., & Sedlmayr, M. (Eds.), German medical data sciences 2021: Digital medicine: Recognize – understand – heal: Proceedings of the Joint conference of the 66th Annual meeting of the German association of medical informatics, biometry, and epidemiology e.V. (gmds) and the 13th Annual meeting of the TMF – technology, methods, and infrastructure for networked medical research e.V. 2021 online in Kiel, Germany. Stud Health Technol Inform 283 (2021), 12–19. doi: 10.3233/SHTI210535. IOS Press, Amsterdam.
Since 2017, the German Society for Medical Informatics, Biometry and Epidemiology e.V. (GMDS) offers the submission of full papers to the annual meetings, optional in Studies in Health Technologies and Informatics (Stud HIT) or in GMS Medical Informatics, Biometrics, and Epidemiology (MIBE). GMDS’ aim is to increase the attractiveness of the conference and paper submission process in particular for young scientists and to increase the visibility of the conference. A standardized peer review process was established. Since 2017, a 25–35% of the contributions have been submitted as full papers. A total of 177 papers were published in Stud HTI. With an unofficial journal impact factor of 1.088 (2019) and 0.540 (2020), the papers were cited with a frequency similarly to national medical journals or full paper contributions of International medical informatics conferences.
The primary intention of any scientific work is to share the gained knowledge and to contribute to the knowledge and progress in the scientific domain. The wide range of journals and conferences, each with specific submission requirements, can be difficult to navigate, especially for young scientists without extensive experience. But a suitable publication strategy can be helpful, especially at the beginning of a scientific career. Using the annual conference of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS) e.V. as an example, this editorial highlights fundamental differences, advantages and disadvantages, as well as assistance in selecting the right form of submission.
Preventable or undiagnosed visual impairment and blindness affect billion of people worldwide. Automated multi-disease detection models offer great potential to address this problem via clinical decision support in diagnosis. In this work, we proposed an innovative multi-disease detection pipeline for retinal imaging which utilizes ensemble learning to combine the predictive capabilities of several heterogeneous deep convolutional neural network models. Our pipeline includes state-of-the-art strategies like transfer learning, class weighting, real-time image augmentation and Focal loss utilization. Furthermore, we integrated ensemble learning techniques like heterogeneous deep learning models, bagging via 5-fold cross-validation and stacked logistic regression models. Through internal and external evaluation, we were able to validate and demonstrate high accuracy and reliability of our pipeline, as well as the comparability with other state-of-the-art pipelines for retinal disease prediction.
In this paper a machine learning model for automatic detection of abnormalities in electroencephalography (EEG) is dissected into parts, so that the influence of each part on the classification accuracy score can be examined. The most successful setup of several shallow artificial neural networks aggregated via voting results in accuracy of 81%. Stepwise simplification of the model shows the expected decrease in accuracy, but a naive model with thresholding of a single extracted feature (relative wavelet energy) is still able to achieve 75%, which remains strongly above the random guess baseline of 54%. These results suggest the feasibility of building a simple classification model ensuring accuracy scores close to the state-of-the-art research but remaining fully interpretable.
Automatic electrocardiogram (ECG) analysis has been one of the very early use cases for computer assisted diagnosis (CAD). Most ECG devices provide some level of automatic ECG analysis. In the recent years, Deep Learning (DL) is increasingly used for this task, with the first models that claim to perform better than human physicians. In this manuscript, a pilot study is conducted to evaluate the added value of such a DL model to existing built-in analysis with respect to clinical relevance. 29 12-lead ECGs have been analyzed with a published DL model and results are compared to build-in analysis and clinical diagnosis. We could not reproduce the results of the test data exactly, presumably due to a different runtime environment. However, the errors were in the order of rounding errors and did not affect the final classification. The excellent performance in detection of left bundle branch block and atrial fibrillation that was reported in the publication could be reproduced. The DL method and the built-in method performed similarly good for the chosen cases regarding clinical relevance. While benefit of the DL method for research can be attested and usage in training can be envisioned, evaluation of added value in clinical practice would require a more comprehensive study with further and more complex cases.
Expert systems have a long tradition in both medical informatics and artificial intelligence research. Traditionally, such systems are created by implementing knowledge provided by experts in a system that can be queried for answers. To automatically generate such knowledge directly from data, the lightweight InteKRator toolbox will be introduced here, which combines knowledge representation and machine learning approaches. The learned knowledge is represented in the form of rules with exceptions that can be inspected and that are easily comprehensible. An inference module allows for the efficient answering of queries, while at the same time offering the possibility of providing explanations for the inference results. The learned knowledge can be revised manually or automatically with new evidence after learning.
Introduction:
Ensuring scientific reproducibility and compliance with documentation guidelines of funding bodies and journals is a topic of greatly increasing importance in biomedical research. Failure to comply, or unawareness of documentation standards can have adverse effects on the translation of research into patient treatments, as well as economic implications. In the context of the German Research Foundation-funded collaborative research center (CRC) 1002, an IT-infrastructure sub-project was designed. Its goal has been to establish standardized metadata documentation and information exchange benefitting the participating research groups with minimal additional documentation efforts.
Methods:
Implementation of the self-developed menoci-based research data platform (RDP) was driven by close communication and collaboration with researchers as early adopters and experts. Requirements analysis and concept development involved in person observation of experimental procedures, interviews and collaboration with researchers and experts, as well as the investigation of available and applicable metadata standards and tools. The Drupal-based RDP features distinct modules for the different documented data and workflow types, and both the development and the types of collected metadata were continuously reviewed and evaluated with the early adopters.
Results:
The menoci-based RDP allows for standardized documentation, sharing and cross-referencing of different data types, workflows, and scientific publications. Different modules have been implemented for specific data types and workflows, allowing for the enrichment of entries with specific metadata and linking to further relevant entries in different modules.
Discussion:
Taking the workflows and datasets of the frequently involved experimental service projects as a starting point for (meta-)data types to overcome irreproducibility of research data, results in increased benefits for researchers with minimized efforts. While the menoci-based RDP with its data models and metadata schema was originally developed in a cardiological context, it has been implemented and extended to other consortia at GÃűttingen Campus and beyond in different life science research areas.
Optimizing the utilization of radiology departments is one of the primary objectives for many hospitals. To support this, a solution has been developed, which at first transforms the export of different Radiological Information Systems (RIS) into the data format of a clinical data warehouse (CDW). Additional features, like for example the time between the creation of a radiologic request and the finalization of the diagnosis for the created images, can then be defined using a simple interface and are calculated and saved in the CDW as well. Finally, the query language of the CDW can be used to create custom reports with all the RIS data including the calculated features and export them into the standard formats Excel and CSV. The solution has been successfully tested with data from two German hospitals.
About 30 million people suffer from a rare disease in Europe. Those affected face a variety of problems. These include the lack of information and difficult access to scientific knowledge for physicians. For a higher visibility of rare diseases and high-quality research, effective documentation and use of data are essential. The aim of this work is to optimize the processing, use and accessibility of data on rare diseases and thus increase the added value from existing information. While dashboards are already being used to visualize clinical data, it is unclear what requirements are prevalent for rare diseases and how these can be implemented with available development tools so that a highly accepted dashboard can be designed. For this purpose, based on an analysis of the current situation and a requirements analysis, a prototype dashboard for the visualization of up-to-date key figures on rare diseases was developed at the University Hospital Carl Gustav Carus in Dresden. The development was based on the user-centered design process in order to achieve a high-level user-friendliness. The requirements analysis identified parameters that stakeholders wanted to see, focusing primarily on statistical analyses. The dashboard handles the automated calculation of statistics as well as their preparation and provision. The evaluations showed the prototypical dashboard would be considered valuable and used by potential users. This work demonstrates that stakeholders are interested in access to prepared information and exemplifies a way to implement it. The dashboard can increase the usage of existing information in terms of a higher accessibility and thus improve the knowledge about rare diseases.
High throughput sequencing technologies have facilitated an outburst in biological knowledge over the past decades and thus enables improvements in personalized medicine. In order to support (international) medical research with the combination of genomic and clinical patient data, a standardization and harmonization of these data sources is highly desirable. To support this increasing importance of genomic data, we have created semantic mapping from raw genomic data to both FHIR (Fast Healthcare Interoperability Resources) and OMOP (Observational Medical Outcomes Partnership) CDM (Common Data Model) and analyzed the data coverage of both models. For this, we calculated the mapping score for different data categories and the relative data coverage in both FHIR and OMOP CDM. Our results show, that the patients genomic data can be mapped to OMOP CDM directly from VCF (Variant Call Format) file with a coverage of slightly over 50%. However, using FHIR as intermediate representation does not lead to further information loss as the already stored data in FHIR can be further transformed into OMOP CDM format with almost 100% success. Our findings are in favor of extending OMOP CDM with patient genomic data using ETL to enable the researchers to apply different analysis methods including machine learning algorithms on genomic data.
OHDSI, a fast growing open-science research community seeks to enable researchers from around the globe to conduct network studies based on standardized data and vocabularies. There is no comprehensive review of publications about OHDSI’s standard: the OMOP Common Data Model and its usage available. In this work we aim to close this gap and provide a summary of existing publications including the analysis of its meta information such as the choice of journals, journal types, countries, as well as an analysis by topics based on a title and abstract screening. Since 2016, the number of publications has been constantly growing and the relevance of the OMOP CDM is increasing in terms of multi-country studies based on observational patient data.
Harmonized and interoperable data management is a core requirement for federated infrastructures in clinical research. Institutions participating in such infrastructures often have to invest large degrees of time and resources in implementing necessary data integration processes to convert their local data to the required target structure. If the data is already available in an alternative shared data structure, the transformation from source to the desired target structure can be implemented once and then be distributed to all participants to reduce effort and harmonize results. The HL7® FHIR® standard is used as a basis for the shared data model of several medical consortia like DKTK and GBA. It is based on so-called resources which can be represented in XML. Oncological data in German university hospitals is commonly available in the ADT/GEKID format. From this common basis we conceptualized and implemented a transformation which accepts ADT/GEKID XML files and returns FHIR resources. We identified several problems with using the general ADT/GEKID structure in federated research infrastructures, as well as some possible pitfalls relating to the FHIR need for resource ids and focus on semantic coding which differs from the approach in the ADT/GEKID standard. To facilitate participation in federated infrastructures, we propose the ADT2FHIR transformation tool for partners with oncological data in the ADT/GEKID format.
Medical routine data has the potential to benefit research. However, transferring this data into a research context is difficult. For this reason Medical Data Integration Centers are being established in German university hospitals to consolidate data from primary information systems in a single location. But, small data-sets from one organization can be insufficient to answer a research question adequately. In order to obtain larger data-sets, attempts to merge and provide data-sets across institutional boundaries are made. Therefore, this paper proposes a possible process that can extract, merge, pseudonymize and provide distributed data-sets from several organizations conforming to privacy regulations. This process is executed according to the open standard BPMN 2.0, the underlying process data model is based on HL7 FHIR R4. The proposed solution is currently being deployed at eight university hospitals and one Trusted Third Party in the HiGHmed consortium.
With the steady increase in the connectivity of the healthcare system, new requirements and challenges are emerging. In addition to the seamless exchange of data between service providers on a national level, the local legacy data must also meet the new requirements. For this purpose, the applications used must be tested securely and sufficiently. However, the availability of suitable and realistic test data is not always given. Therefore, this study deals with the creation of test data based on real electronic health record data provided by the Medical Information Mart for Intensive Care (MIMIC-IV) database. In addition to converting the data to the current FHIR R4, conversion to the core data sets of the German Medical Informatics Initiative was also presented and made available. The test data was generated to simulate a legacy data transfer. Moreover, four different FHIR servers were tested for performance. This study is the first step toward comparable test scenarios around shared datasets and promotes comparability among providers on a national level.
To ensure semantic interoperability within healthcare systems, using common, curated terminological systems to identify relevant concepts is of fundamental importance. The HL7 FHIR standard specifies means of modelling terminological systems and appropriate ways of accessing and querying these artefacts within a terminology server. Hence, initiatives towards healthcare interoperability like IHE specify not only software interfaces, but also common codes in the form of value sets and code systems. The way in which these coding tables are provided is not necessarily compatible to the current version of the HL7 FHIR specification and therefore cannot be used with current HL7 FHIR-based terminology servers. This work demonstrates a conversion of terminological resources specified by the Integrating the Healthcare Initiative in the ART-DECOR platform, partly available in HL7 FHIR, to ensure that they can be used within a HL7 FHIR-based terminological server. The approach itself can be used for other terminological resources specified within ART-DECOR but can also be used as the basis for other code-driven conversions of proprietary coding schemes.
Introduction:
While virtual reality (VR) is an emerging paradigm in a variety of research contexts, VR-based embodiment effects on behavior and performance still lack in sufficient evidence regarding to bias in cognitive performance assessment.
Methods:
In this methodological observational study, we compare the VR measurement of cognitive performance with a conventional computer-based testing approach in real life (RL) in younger and older adults. The differences between VR and RL scenarios are investigated using the background of two theoretical models from cognitive psychology. Furthermore, data assessment reliability and validity are analyzed, concerning the feasibility of technological and ergonomic aspects.
Results:
A within-group comparison showed no change in information processing speed in either one of the two age groups, i.e., both groups perform equally well in RL and in a VR testing environment.
Conclusion:
The use of lifelike VR environments for cognitive performance tests seems not to lead to any performance changes compared to RL computer-based assessments, making VR suitable for similar applications. On technical concerns, we recommend the careful use of reaction time paradigms regarding to input hardware and stimuli presentation.
Wearables are commercially available devices allowing continuous monitoring of users’ health parameters. Their easy availability, increasing accuracy and functionality render them relevant for medical practice, specifically for longitudinal monitoring. There are clear benefits for the health care system, such as the opportunity of timely interventions by monitoring a patient during his daily life, resulting in a cost reduction in medical care and improved patient well-being. However, some tools are essential to enable the application of wearables in medical daily practice. For example, there is a need for software solutions that allow clinicians to quickly and easily analyze data from devices of their patients. The goal of this study was to develop a dashboard for physicians, which allows rapid data interpretation of longitudinal data from the Apple Watch. The prototype dashboard is an interactive web-based visualization platform utilizing Plotly. The dashboard displays the most important parameters like heart rate, steps per day, activity, exercise collected by the Apple Watch in a user-friendly and accessible way. Clear visualization makes it easy to identify trends or deviations in the data and see how these changes in daily behaviour affect patients’ health. Our software is a key component to monitor patients with heart failure who participate in the HiGHmed use case cardiology project.
Background:
Assessing the uncertainty of diagnostic findings is essential for advising patients. Previous research has demonstrated the difficulty of computing the expected correctness of positive or negative results, although clinical decision support (CDS) tools promise to facilitate adequate interpretations.
Objectives:
To teach the potential utility of CDS tools to medical students, we designed an interactive software module that computes and visualizes relevant probabilities from typical inputs.
Methods:
We reviewed the literature on recommended graphical approaches and decided to support contingency tables, plain table formats, tree diagrams, and icon arrays.
Results:
We implemented these functions in a single-page web application, which was configured to complement our local learning management system where students also access interpretation tasks.
Conclusion:
Our technical choices promoted a rapid implementation. We intend to explore the utility of the tool during some upcoming courses. Future developments could also model a more complex clinical reality where the likelihood of alternative diagnoses is estimated from sets of clinical investigations.
openMNGlab is an open-source software framework for data analysis, tailored for the specific needs of microneurography – a type of electrophysiological technique particularly important for research on peripheral neural fibers coding. Currently, openMNGlab loads data from Spike2 and Dapsys, which are two major data acquisition solutions. By building on top of the Neo software, openMNGlab can be easily extended to handle the most common electrophysiological data formats. Furthermore, it provides methods for data visualization, fiber tracking, and a modular feature database to extract features for data analysis and machine learning.