Ebook: German Medical Data Sciences 2023 – Science. Close to People.
Semantic computing is an integral part of modern technology, an essential component of fields as diverse as artificial intelligence, data science, knowledge discovery and management, big data analytics, e-commerce, enterprise search, technical documentation, document management, business intelligence, and enterprise vocabulary management.
This book presents the proceedings of SEMANTICS 2023, the 19th International Conference on Semantic Systems, held in Leipzig, Germany, from 20 to 22 September 2023. The conference is a pivotal event for those professionals and researchers actively engaged in harnessing the power of semantic computing, an opportunity to increase their understanding of the subject’s transformative potential while confronting its practical limitations. Attendees include information managers, IT architects, software engineers, and researchers from a broad spectrum of organizations, including research facilities, non-profit entities, public administrations, and the world's largest corporations. This year’s conference has the subtitle Towards Decentralized Knowledge Ecosystems, and a total of 54 submissions were received in response to a call for papers. These were subjected to a rigorous, double-blind review process, with at least three independent reviews conducted for each submission. The 16 papers included here were ultimately accepted for presentation, with an acceptance rate of 29.6%. Areas covered include novel research challenges in areas such as data science, machine learning, logic programming, content engineering, social computing, and the Semantic Web.
The book provides an up-to-date overview, which will be of interest to all those wishing to stay abreast of emerging trends and themes within the vast field of semantic computing.
A total of 227 scientific contributions were submitted for the 68th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS). Among these were 43 full papers, including 2 for the journal GMS Medical Informatics, Biometry and Epidemiology (GMS MIBE), and 41 full papers for Studies in Health Technologies and Informatics (Stud HTI), as well as 184 abstracts, including 118 as presentations and 66 as posters.
In a two-stage review process, 2 contributions were accepted for publication in the GMS MIBE and 30 for publication in Stud HTI. This corresponds to an acceptance rate of 74% (MIBE 100%, Stud HTI 73%). The results of the entire review process are shown in figure 1.
We would like to thank all authors for submitting their research results, without which neither the exciting conference program nor the interesting conference proceedings would have been possible. Above all, however, we would like to thank the reviewers for the 632 reviews. With your sometimes very meticulous and extensive reviews, you have made a relevant contribution to quality assurance, and in many cases also to improving the quality of the manuscripts.
We look forward to an interesting conference, which will finally take place again face-to-face after three years of online meetings. Last but not least, we would like to congratulate Heilbronn University on the 50th anniversary of its Medical Informatics program!
(Editor in Chief GMDS in Stud HTI) Rainer Röhrig
(Chair of SPC) Martin Haag
(Bioinformatics and Systems Medicine) Tim BeiSSbarth
(Bioinformatics and Systems Medicine) Nils Grabe
(Medical Informatics) Ursula Hübner
(Editor in Chief GMS MIBE) Petra Knaup-Gregori
(Epidemiology) Jochem König
(Medical Documentation) Claudia Ose
(Medical Informatics) Ulrich Sax
(Epidemiology) Carsten Schmidt
(Congress Secretary) Martin Sedlmayr
(Biometrics) Antionia Zapf
Metadata is essential for handling medical data according to FAIR principles. Standards are well-established for many types of electrophysiological methods but are still lacking for microneurographic recordings of peripheral sensory nerve fibers in humans. Developing a new concept to enhance laboratory workflows is a complex process. We propose a standard for structuring and storing microneurography metadata based on odML and odML-tables. Further, we present an extension to the odML-tables GUI that enables user-friendly search functionality of the database. With our open-source repository, we encourage other microneurography labs to incorporate odML-based metadata into their experimental routines.
Introduction:
There is increasing interest on re-use of outpatient healthcare data for research, as most medical diagnosis and treatment is provided in the ambulatory sector. One of the early projects to bring primary data from German ambulatory care into clinical research technically, organizationally and in compliance with legal demands has been the RADAR project, that is based on a broad consent and has used the then available practice information system’s interfaces to extract and transfer data to a research repository. In course of the digital transformation of the German healthcare system, former standards are abandoned and new interoperability standards, interfaces and regulations on secondary use of patient data are defined, however with slow adoption by Health-IT systems. Therefore, it is of importance for all initiatives that aim at using ambulatory healthcare data for research, how to access this data in an efficient and effective way.
Methods:
Currently defined healthcare standards are compared regarding coverage of data relevant for research as defined by the RADAR project. We compare four architectural options to access ambulatory health data through different components of healthcare and health research data infrastructures along the technical, organizational and regulatory conditions, the timetable of dissemination and the researcher’s perspective.
Results:
A high-level comparison showed a high degree of semantic overlap in the information models used. Electronic patient records and practice information systems are alternative data sources for ambulatory health data - but differ strongly in data richness and accessibility.
Conclusion:
Considering the compared dimensions of architectural routes to access health data for secondary research use we conclude that data extraction from practice information systems is currently the most promising way due to data availability on a mid-term perspective. Integration of routine data into the national research data infrastructures might be enforced by convergence of to date different information models.
Introduction:
The diagnosis and treatment of Parkinson’s disease depend on the assessment of motor symptoms. Wearables and machine learning algorithms have emerged to collect large amounts of data and potentially support clinicians in clinical and ambulant settings.
State of the art:
However, a systematical and reusable data architecture for storage, processing, and analysis of inertial sensor data is not available. Consequently, datasets vary significantly between studies and prevent comparability.
Concept:
To simplify research on the neurodegenerative disorder, we propose an efficient and real-time-optimized architecture compatible with HL7 FHIR backed by a relational database schema.
Lessons learned:
We can verify the adequate performance of the system on an experimental benchmark and in a clinical experiment. However, existing standards need to be further optimized to be fully sufficient for data with high temporal resolution.
Introduction:
With increasing availability of reusable biomedical data – from cohort studies to clinical routine data, data re-users face the problem to manage transferred data according to the heterogeneous data use agreements. While structured metadata is addressed in many contexts including informed consent, contracts are to date still unstructured text documents. In particular within collaborative and active working groups the actual usage agreement’s regulations are highly relevant for the daily practice – can I share the data with colleagues from the same university or the same research network, can they be stored on a PHD student’s laptop, can I store the data for further approved data usage requests?
Methods:
In this article, we inspect and review seven different data usage agreements. We focus on digital data that is copied and transferred to the requester’s environment.
Results:
We identified 24 metadata items in the four main categories data usage, storage, and sharing, as well as publication of results.
Discussion:
While the topics are largely overlap in the data use agreements, the actual regulations of the topics are diverse. Although we do not explicitly investigate trusted research environments, where data is offered within an analytics platform, we consider them a as subgroup, where most of the practical questions from the data scientist’s perspective also arise.
Conclusion:
With a limited set of structured metadata items, data scientists could have information about the data use agreement at hand along with the transferred data in an easily accessible way.
Introduction:
The increasing need for secondary use of clinical study data requires FAIR infrastructures, i.e. provide findable, accessible, interoperable and reusable data. It is crucial for data scientists to assess the number and distribution of cohorts that meet complex combinations of criteria defined by the research question. This so-called feasibility test is increasingly offered as a self-service, where scientists can filter the available data according to specific parameters. Early feasibility tools have been developed for biosamples or image collections. They are of high interest for clinical study platforms that federate multiple studies and data types, but they pose specific requirements on the integration of data sources and data protection.
Methods:
Mandatory and desired requirements for such tools were acquired from two user groups — primary users and staff managing a platform’s transfer office. Open Source feasibility tools were sought by different literature search strategies and evaluated on their adaptability to the requirements.
Results:
We identified seven feasibility tools that we evaluated based on six mandatory properties.
Discussion:
We determined five feasibility tools to be most promising candidates for adaption to a clinical study research data platform, the Clinical Communication Platform, the German Portal for Medical Research Data, the Feasibility Explorer, Medical Controlling, and the Sample Locator.
Introduction:
The collection of examination data for large clinical studies is often done with proprietary systems, which are accompanied by several disadvantages such as high cost and low flexibility. With the use of open-source tools, these disadvantages can be overcome and thereby improve data collection as well as data quality. Here we exemplary use the data collection process of the Hamburg City Health Study (HCHS), carried out at the University Medical Center Hamburg-Eppendorf (UKE). We evaluated how the recording of the examination data can be converted from an established, proprietary electronic healthcare record (EHR) system to the free-to-use Research Electronic Data Capture (REDCap) software.
Methods:
For this purpose, a technical conversion of the EHR system is described first. Metafiles derived from the EHR system were used for REDCap electronic case report form (eCRF) building. The REDCap system was tested by HCHS study assistants via completion of self-developed tasks mimicking their everyday study life. Usability was quantitatively evaluated via the IBM Computer System Usability Questionnaire (CSUQ) and qualitatively assessed with a semi-structured interview.
Results:
With the IBM CSUQ, the study assistants rated the usage of the basic REDCap system for HCHS examination data collection with an overall score of 4.39, which represents a medium acceptance. The interview feedback was used to formulate user stories to subsequently increase the administrative sovereignty and to conceptualize a REDCap HCHS information technology (IT) infrastructure.
Conclusion:
Our work aims to serve as a template for evaluating the feasibility of a conversion from a proprietary to a free-to-use data collection tool for large clinical studies such as the HCHS. REDCap has great potential, but extensions and an integration to the current IT infrastructure are required.
NGS is increasingly used in precision medicine, but an automated sequencing pipeline that can detect different types of variants (single nucleotide - SNV, copy number - CNV, structural - SV) and does not rely on normal samples as germline comparison is needed. To address this, we developed Onkopipe, a Snakemake-based pipeline that integrates quality control, read alignments, BAM pre-processing, and variant calling tools to detect SNV, CNV, and SV in a unified VCF format without matched normal samples. Onkopipe is containerized and provides features such as reproducibility, parallelization, and easy customization, enabling the analysis of genomic data in precision medicine. Our validation and evaluation demonstrate high accuracy and concordance, making Onkopipe a valuable open-source resource for molecular tumor boards. Onkopipe is being shared as an open source project and is available at https://gitlab.gwdg.de/MedBioinf/mtb/onkopipe.
The detection and prevention of medication-related health risks, such as medication-associated adverse events (AEs), is a major challenge in patient care. A systematic review on the incidence and nature of in-hospital AEs found that 9.2% of hospitalised patients suffer an AE, and approximately 43% of these AEs are considered to be preventable. Adverse events can be identified using algorithms that operate on electronic medical records (EMRs) and research databases. Such algorithms normally consist of structured filter criteria and rules to identify individuals with certain phenotypic traits, thus are referred to as phenotype algorithms. Many attempts have been made to create tools that support the development of algorithms and their application to EMRs. However, there are still gaps in terms of functionalities of such tools, such as standardised representation of algorithms and complex Boolean and temporal logic. In this work, we focus on the AE delirium, an acute brain disorder affecting mental status and attention, thus not trivial to operationalise in EMR data. We use this AE as an example to demonstrate the modelling process in our ontology-based framework (TOP Framework) for modelling and executing phenotype algorithms. The resulting semantically modelled delirium phenotype algorithm is independent of data structure, query languages and other technical aspects, and can be run on a variety of source systems in different institutions.
Introduction:
In the last decade numerous real-world data networks have been established in order to leverage the value of data from electronic health records for medical research. In Germany, a nation-wide network based on electronic health record data from all German university hospitals has been established within the Medical Informatics Initiative (MII) and recently opened for researcherst’ access through the German Portal for Medical Research Data (FDPG). In Bavaria, the six university hospitals have joined forces within the Bavarian Cancer Research Center (BZKF). The oncology departments aim at establishing a federated observational research network based on the framework of the MII/FDPG and extending it with a clear focus on oncological clinical data, imaging data and molecular high throughput analysis data.
Methods:
We describe necessary adaptions and extensions of existing MII components with the goal of establishing a Bavarian oncology real world data research platform with its first use case of performing federated feasibility queries on clinical oncology data.
Results:
We share insights from developing a feasibility platform prototype and presenting it to end users. Our main discovery was that oncological data is characterized by a higher degree of interdependence and complexity compared to the MII core dataset that is already integrated into the FDPG.
Discussion:
The significance of our work lies in the requirements we formulated for extending already existing MII components to match oncology specific data and to meet oncology researchers needs while simultaneously transferring back our results and experiences into further developments within the MII.
Gamification has many positive effects, such as increased motivation, engagement, and well-being of users. For this purpose, a wide field of game mechanics is already available that can be used in teaching. For the development of gamified teaching methods, it’s important to adapt the mechanics used to the students. There are different models that divide target groups of games and gamification into player types to understand what motivates the respective users. This paper describes a study of player types among students of health-related disciplines and analyses the data by a K-Means clustering procedure. The player types Socializer, Player and Achiever are found, and game elements for this groups are suggested. Thus, in the field of health education, game mechanics can be used, which are suitable for students of this domain.
The System Usability Scale (SUS) is a reliable tool for usability measurement and evaluation. Since its original language is English, a translation is required before a target group can answer it in their native language. The challenge of translating questionnaires lies in the preservation of its original properties. Different versions of a German SUS have been proposed and are currently in use. Objective of this work is to find and compare available German translations. Four versions were found and compared in terms of the translation process and the exact wording of the translation. Only the version of Gao et al. has been systematically validated, but has an unnatural wording. Although not validated yet, the proposed version of Rummel et al. is a good compromise between wording and methodically clean development. The version of Lohmann and Schäffer is the close runner up, as it may improve the wording at the expense of methodological accuracy. Since the version of Rauer gives no information about its translation process, it is considered least preferred of the four compared translations.
Background:
The number of emergency medical service (EMS) calls in Germany is continuously increasing. The initial assessment, the pre-hospital care and the choice of hospital for further care by the EMS influences the patient’s outcome and are the basis for further care in hospital. However, the EMS does not receive any official feedback on its decisions.
Objectives:
This study evaluates the demand for a feedback system from the emergency department (ED) to the EMS, what it should contain, and how it could be integrated in the electronic clinical systems.
Methods:
A semi-structured interview guideline for expert interviews with members of EMS staff (n = 6) and ED staff (n = 17) was developed. A mockup to visualise a possible implementation was designed and included in the interview.
Results:
There is a significant demand for feedback on pre-diagnosis, pre-hospital care and handover of patients from the EMS to the ED. The EDs are very interested in improving the collaboration with the paramedic services through feedback.
Conclusion:
A feedback system is strongly desired by various EMS stakeholders and, according to them, could improve both EMS and ED collaboration and overall patient care.
Background:
In Germany, patients are entitled to a medication plan. While the overview is useful, it does not contain explicit information on various potential adverse drug events (ADEs). Therefore, physicians must continue to seek information from various sources to ensure medication safety.
Objective:
In this project a first functional prototype of a medication therapy tool was developed that focuses on visualizing and highlighting potential ADEs. A usability analysis about the tool’s functionality, design and usability was conducted.
Methods:
A web application tool was developed using the MMI Pharmindex as database. ADEs are color coded and can be displayed in three different ways – as a list, a table, or a graph. To test the tool, an online survey was conducted amongst healthcare professionals (n = 9). The test included two real medication plans to check ADEs through the tool.
Results:
The survey results indicated that the web tool was clear and self-explanatory. It scored overall “good” (score: 76.5) on the System Usability Scale questionnaire. Due to the free-text information of the database used, there were some inconsistencies in the visualized ADEs.
Conclusion:
There is a demand for a visualization tool for medications. The high quality of the database is crucial in order to correctly visualize all necessary information, such as drug-drug interactions and inclusion of patient data. This is essential to provide a trustworthy tool for medical professionals.
Introduction:
One possibility to support veterinarians in times of a vet shortage is by providing animal owners with a technical decision support for deciding whether their animal needs to be seen by a vet. As the first step in the user-centered development of such an mHealth application for equestrians, an analysis of the context of use was done.
Methods:
The analysis was carried out by reviewing existing literature and conducting an online survey with 100 participants.
Results:
Characteristics of the user group and the usage context are presented using an adaptation of the four layers of diversity. Many equestrians are lacking health-related knowledge and competencies as well as social networks supporting them in decision making and gaining further information. This may apply to owners of other animal species in broad ranges as well.
Conclusion:
The results of the analysis provide information to software developers and researchers on mHealth applications for pet owners in general and equestrians in particular to focus their work on the users’ needs and therefore provide efficient results/software.
Introduction:
Conducting research on human-computer interaction and information retrieval requires unobtrusive observations within existing network architectures.
State of the art:
Most of the available tools are not suitable to be applied within restricted clinical systems. The specific requirements hinder analysis of the human factors in health sciences.
Concept:
We identified extensions for popular web browsers as a suitable way to conduct studies in highly regulated environments.
Implementation:
Considering the specialized requirements and an adequate level of transparency for the recorded clinician, we developed an open-source Web Extension compatible with major web browsers.
Lessons learned:
We identified the challenges associated with the specific tool and are preparing its use to understand clinical reasoning in personalized oncology.
Introduction:
Prospective data collection in clinical trials is considered the gold standard of clinical research. Validating data entered in input fields in case report forms is unavoidable to maintain good data quality. Data quality checks include both the conformance of individual inputs to the specification of the data element, the detection of missing values, and the plausibility of the values entered.
State-of-the-Art:
Besides Libre-/OpenClinica there are many applications for capturing clinical data. While most of them have a commercial approach, free and open-source solutions lack intuitive operation.
Concept:
Our ocRuleTool is made for the specific use case to write validation rules for Open-/LibreClinica, a clinical study management software for designing case report forms and managing medical data in clinical trials. It addresses parts of all three categories of data quality checks mentioned above.
Implementation:
The required rules and error messages are entered in the normative Excel specification and then converted to an XML document which can be uploaded to Open-/LibreClinica. The advantage of this intermediate step is a better readability as the complex XML elements are broken down into easy to fill out columns in Excel. The tool then generates the ready to use XML file by itself.
Lessons Learned:
This approach saves time, is less error-prone and allows collaboration with clinicians on improving data quality.
Conclusion:
Our ocRuleTool has proven useful in over a dozen studies. We hope to increase the user base by releasing it to open source on GitHub.
The German Medical Informatics Initiative has agreed on a HL7 FHIR-based core data set as the common data model that all 37 university hospitals use for their patient’s data. These data are stored locally at the site but are centrally queryable for researchers and accessible upon request. This infrastructure is currently under construction, and its functionality is being tested by so-called Projectathons. In the 6th Projectathon, a clinical hypothesis was formulated, executed in a multicenter scenario, and its results were analyzed. A number of oddities emerged in the analysis of data from different sites. Biometricians, who had previously performed analyses in prospective data collection settings such as clinical trials or cohorts, were not consistently aware of these idiosyncrasies. This field report describes data quality problems that have occurred, although not all are genuine errors. The aim is to point out such circumstances of data generation that may affect statistical analysis.
Introduction:
Contradiction is a relevant data quality indicator to evaluate the plausibility of interdependent health data items. However, while contradiction assessment is achieved using domain-established contradictory dependencies, recent studies have shown the necessity for additional requirements to reach conclusive contradiction findings. For example, the oral or rectal methods used in measuring the body temperature will influence the thresholds of fever definition. The availability of this required information as explicit data items must be guaranteed during study design. In this work, we investigate the impact of activities related to study database implementation on contradiction assessment from two perspectives including: 1) additionally required metadata and 2) implementation of checks within electronic case report forms to prevent contradictory data entries.
Methods:
Relevant information (timestamps, measurement methods, units, and interdependency rules) required for contradiction checks are identified. Scores are assigned to these parameters and two different studies are evaluated based on the fulfillment of the requirements by two selected interdependent data item sets.
Results:
None of the studies have fulfilled all requirements. While timestamps and measurement units are found, missing information about measurement methods may impede conclusive contradiction assessment. Implemented checks are only found if data are directly entered.
Discussion:
Conclusive contradiction assessment typically requires metadata in the context of captured data items. Consideration during study design and implementation of data capture systems may support better data quality in studies and could be further adopted in primary health information systems to enhance clinical anamnestic documentation.
Representing knowledge in a comprehensible and maintainable way and transparently providing inferences thereof are important issues, especially in the context of applications related to artificial intelligence in medicine. This becomes even more obvious if the knowledge is dynamically growing and changing and when machine learning techniques are being involved. In this paper, we present an approach for representing knowledge about cancer therapies collected over two decades at St.-Johannes-Hospital in Dortmund, Germany. The presented approach makes use of InteKRator, a toolbox that combines knowledge representation and machine learning techniques, including the possibility of explaining inferences. An extended use of InteKRator’s reasoning system will be introduced for being able to provide the required inferences. The presented approach is general enough to be transferred to other data, as well as to other domains. The approach will be evaluated, e. g., regarding comprehensibility, accuracy and reasoning efficiency.