Ebook: German Medical Data Sciences 2024
The use of biomedical informatics and other advanced technologies have become increasingly important in the effort to promote health and wellbeing for all.
This book presents the proceedings of GMDS 2024, the 69th conference of the German Society of Medical Informatics, Biometry and Epidemiology, held from 8 – 11 September 2024, in Dresden, Germany. The theme of GMDS 2024 was Health - thinking, researching, acting together, and the conference provided a platform for leading professionals in the fields of biomedical informatics, biometrics, epidemiology, social medicine & prevention, medical sociology and public health to present the latest developments and research in the field. Of the 79 full papers submitted for the conference, 40 were ultimately selected for presentation and publication here following a rigorous review process, an acceptance rate of 51%. The 40 papers are grouped under 9 section headings: healthcare IT systems; research infrastructure for biomedical research - FAIRification of medical data; study and data management; interoperability - standards, classifications, terminologies; knowledge representation; clinical decision support systems (CDSS),machine learning, artificial intelligence and large language models; machine learning and privacy; medical sociotechnical systems, human factors, usability; and imaging.
Presenting some of the latest developments in preventive and curative healthcare and science, the book will be of interest to healthcare professionals everywhere.
This year, a total of 265 scientific contributions were submitted, including 79 full papers (5 for the GMS MIBE and 74 for Studies in Health Technologies and Informatics, Stud HTI), and 186 abstracts (115 for presentations and 71 posters). Overall, the number of submissions has increased significantly as compared to previous years. The number of full papers submitted has also risen particularly sharply (see Table 1).
Compared to the last three years, submissions have risen slightly in all disciplines, but particularly strongly for medical informatics and interdisciplinary contributions. However, cultural differences are also evident. For example, in the field of biometry and epidemiology, only abstracts and no or only a few full papers are traditionally submitted, but more workshops are submitted, which are not included in these statistics (see Table 2).
Each of the contributions was assessed by at least two reviewers. For the full papers in particular, but often also for the abstracts, many reviewers provided very comprehensive and constructive reviews with helpful advice for the authors. This has often contributed to a relevant improvement of the manuscript or abstract. Despite the increase in the number of papers to be reviewed and, in particular, the increase in full papers and the additional work associated with this, this year’s review process was successfully completed without a single day’s delay. The Editorial Board and Scientific Program Committee (SPC) would therefore like to thank the 228 reviewers (see below) for their 883 reviews. Their willingness and also the quality of the reviews cannot be taken for granted!
In the case of full papers, the editors’ decision on the acceptance of a full paper was firstly on the basis of the reviews. If the publication was rejected as a full paper, the SPC then decided whether to accept it as an abstract (talk or poster) or to reject it. Overall, the Editorial Board perceived the quality of the manuscripts submitted to be significantly better than in previous years. Nevertheless, many full papers had to be rejected because there were still weaknesses in the submitted form that could not be rectified within two weeks. Overall, the acceptance rate for full papers was 51% (40 out of 79), including 0% (0 out of 5) for MIBE and 54% (40 out of 74) for Stud HTI. It is striking that of the 33 papers rejected as full papers but accepted as abstracts, 5 were withdrawn, and the abstracts of 5 papers was not revised and were thus rejected. Of the rejected full papers, only about 70% accepted the offer to present their research work at the conference. You can see the complete course of the two-stage review process in Figure 1.
We look forward to the conference in Dresden with an exciting program featuring a total of 86 poster presentations and 140 lectures (see also Table 3) – plus numerous workshops, keynotes and tutorials.
(Editor in Chief GMDS-Series in Stud HTI) Rainer Röhrig
(Editor in Chief GMS MIBE) Petra Knaup-Gregori
(Congress Secretary, Chair of SPC) Martin Sedlmayr
(Bioinformatics and System Medicine) Niels Grabe
(Medical Informatics) Ursula Hertha Hübner
(Bioinformatics and System Medicine) Klaus Jung
(Epidemiology) Jochem König (Biometrics) Ingo Röder
(Medical Informatics) Ulrich Sax
(Epidemiology) Carsten Oliver Schmidt
(Epidemiology) Jochen Schmitt
(Biometrics) Antonia Zapf
(Biometrics) Daniela Zöller
Introduction:
Increase in health IT adoption is often driven by financial support through the state. In 2020, the German Hospital Future Law passed Parliament with a schedule to see potential effects in 2023. The research question of the present study thus was if there were differences between 2017 and 2023 in selected application areas eligible for funding by the law.
Methods:
Availability and percentage of use in clinical units was measured in a panel of 172 hospitals for these areas. A linear mixed model with repeated measures yielded a significant increase in “medication management” and “discharge management”.
Results and Discussion:
In “medication management”, hospitals in a group as compared to single hospitals tripled the percentage of clinical units using IT systems for this purpose. Not-for-profit hospitals doubled their IT systems for “discharge management” when compared to for-profit hospitals.
Conclusion:
Whether these changes can be attributed to the Hospital Future Law is debatable due to severe delays in various fields, particularly in making funding available. There is room for speeding up particularly the administrative funding process and finally demonstrating results that are proportional to the government money invested.
Background:
In the context of the telematics infrastructure, new data usage regulations, and the growing potential of artificial intelligence, cloud computing plays a key role in driving the digitalization in the German hospital sector.
Methods:
Against this background, the study aims to develop and validate a scale for assessing the cloud readiness of German hospitals. It uses the TPOM (Technology, People, Organization, Macro-Environment) framework to create a scoring system. A survey involving 110 Chief Information Officers (CIOs) from German hospitals was conducted, followed by an exploratory factor analysis and reliability testing to refine the items, resulting in a final set of 30 items.
Results:
The analysis confirmed the statistical robustness and identified key factors contributing to cloud readiness. These include IT security in the dimension “technology”, collaborative research and acceptance for the need to make high quality data available in the dimension “people”, scalability of IT resources in the dimension “organization”, and legal aspects in the dimension “macroenvironment”. The macroenvironment dimension emerged as particularly stable, highlighting the critical role of regulatory compliance in the healthcare sector.
Conclusion:
The findings suggest a certain degree of cloud readiness among German hospitals, with potential for improvement in all four dimensions. Systemically, legal requirements and a challenging political environment are top concerns for CIOs, impacting their cloud readiness.
Individual health data is crucial for scientific advancements, particularly in developing Artificial Intelligence (AI); however, sharing real patient information is often restricted due to privacy concerns. A promising solution to this challenge is synthetic data generation. This technique creates entirely new datasets that mimic the statistical properties of real data, while preserving confidential patient information. In this paper, we present the workflow and different services developed in the context of Germany’s National Data Infrastructure project NFDI4Health. First, two state-of-the-art AI tools (namely, VAMBN and MultiNODEs) for generating synthetic health data are outlined. Further, we introduce SYNDAT (a public web-based tool) which allows users to visualize and assess the quality and risk of synthetic data provided by desired generative models. Additionally, the utility of the proposed methods and the web-based tool is showcased using data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Center for Cancer Registry Data of the Robert Koch Institute (RKI).
Introduction:
Process Mining (PM) has emerged as a transformative tool in healthcare, facilitating the enhancement of process models and predicting potential anomalies. However, the widespread application of PM in healthcare is hindered by the lack of structured event logs and specific data privacy regulations.
Concept:
This paper introduces a pipeline that converts routine healthcare data into PM-compatible event logs, leveraging the newly available permissions under the Health Data Utilization Act to use healthcare data.
Implementation:
Our system exploits the Core Data Sets (CDS) provided by Data Integration Centers (DICs). It involves converting routine data into Fast Healthcare Interoperable Resources (FHIR), storing it locally, and subsequently transforming it into standardized PM event logs through FHIR queries applicable on any DIC. This facilitates the extraction of detailed, actionable insights across various healthcare settings without altering existing DIC infrastructures.
Lessons Learned:
Challenges encountered include handling the variability and quality of data, and overcoming network and computational constraints. Our pipeline demonstrates how PM can be applied even in complex systems like healthcare, by allowing for a standardized yet flexible analysis pipeline which is widely applicable.The successful application emphasize the critical role of tailored event log generation and data querying capabilities in enabling effective PM applications, thus enabling evidence-based improvements in healthcare processes.
Introduction:
The Local Data Hub (LDH) is a platform for FAIR sharing of medical research (meta-)data. In order to promote the usage of LDH in different research communities, it is important to understand the domain-specific needs, solutions currently used for data organization and provide support for seamless uploads to a LDH. In this work, we analyze the use case of microneurography, which is an electrophysiological technique for analyzing neural activity.
Methods:
After performing a requirements analysis in dialogue with microneurography researchers, we propose a concept-mapping and a workflow, for the researchers to transform and upload their metadata. Further, we implemented a semi-automatic upload extension to odMLtables, a template-based tool for handling metadata in the electrophysiological community.
Results:
The open-source implementation enables the odML-to-LDH concept mapping, allows data anonymization from within the tool and the creation of custom-made summaries on the underlying data sets.
Discussion:
This concludes a first step towards integrating improved FAIR processes into the research laboratory’s daily workflow. In future work, we will extend this approach to other use cases to disseminate the usage of LDHs in a larger research community.
Introduction:
Data-driven medical research (DDMR) needs multimodal data (MMD) to sufficiently capture the complexity of clinical cases. Methods for early multimodal data integration (MMDI), i.e. integration of the data before performing a data analysis, vary from basic concatenation to applying Deep Learning, each with distinct characteristics and challenges. Besides early MMDI, there exists late MMDI which performs modality-specific data analyses and then combines the analysis results.
Methods:
We conducted a scoping review, following PRISMA guidelines, to find and analyze 21 reviews on methods for early MMDI between 2019 and 2024.
Results:
Our analysis categorized these methods into four groups and summarized group-specific characteristics that are relevant for choosing the optimal method combination for MMDI pipelines in DDMR projects. Moreover, we found that early MMDI is often performed by executing several methods subsequently in a pipeline. This early MMDI pipeline is usually subject to manual optimization.
Discussion:
Our focus was on structural integration in DDMR. The choice of MMDI method depends on the research setting, complexity, and the researcher team’s expertise. Future research could focus on comparing early and late MMDI approaches as well as automating the optimization of MMDI pipelines to integrate vast amounts of real-world medical data effectively, facilitating holistic DDMR.
Introduction:
To support research projects that require medical data from multiple sites is one of the goals of the German Medical Informatics Initiative (MII). The data integration centers (DIC) at university medical centers in Germany provide patient data via FHIR® in compliance with the MII core data set (CDS). Requirements for data protection and other legal bases for processing prefer decentralized processing of the relevant data in the DICs and the subsequent exchange of aggregated results for cross-site evaluation.
Methods:
Requirements from clinical experts were obtained in the context of the MII use case INTERPOLAR. A software architecture was then developed, modeled using 3LGM2, finally implemented and published in a github repository.
Results:
With the CDS tool chain, we have created software components for decentralized processing on the basis of the MII CDS. The CDS tool chain requires access to a local FHIR endpoint and then transfers the data to an SQL database. This is accessed by the DataProcessor component, which performs calculations with the help of rules (input repo) and writes the results back to the database. The CDS tool chain also has a frontend module (REDCap), which is used to display the output data and calculated results, and allows verification, evaluation, comments and other responses. This feedback is also persisted in the database and is available for further use, analysis or data sharing in the future.
Discussion:
Other solutions are conceivable. Our solution utilizes the advantages of an SQL database. This enables flexible and direct processing of the stored data using established analysis methods. Due to the modularization, adjustments can be made so that it can be used in other projects. We are planning further developments to support pseudonymization and data sharing. Initial experience is being gathered. An evaluation is pending and planned.
Introduction:
The Medical Informatics Initiative (MII) in Germany has pioneered platforms such as the National Portal for Medical Research Data (FDPG) to enhance the accessibility of data from clinical routine care for research across both university and non-university healthcare settings. This study explores the efficacy of the Medical Informatics Hub in Saxony (MiHUBx) services by integrating Klinikum Chemnitz gGmbH (KC) with the FDPG, leveraging the Fast Healthcare Interoperability Resources Core Data Set of the MII to standardize and harmonize data from disparate source systems.
Methods:
The employed procedures include deploying installation packages to convert data into FHIR format and utilizing the Research Data Repository for structured data storage and exchange within the clinical infrastructure of KC.
Result:
Our results demonstrate successful integration, the development of a comprehensive deployment diagram, additionally, it was demonstrated that the non-university site can report clinical data to the FDPG.
Discussion:
The discussion reflects on the practical application of this integration, highlighting its potential scalability to even smaller healthcare facilities and to pave the way to access to more medical data for research. This exemplary demonstration of the interplay of different tools provides valuable insights into technical and operational challenges, setting a precedent for future expansions and contributing to the democratization of medical data access.
Introduction:
Medical research studies which involve electronic data capture of sensitive data about human subjects need to manage medical and identifying participant data in a secure manner. To protect the identity of data subjects, an independent trusted third party should be responsible for pseudonymization and management of the identifying data.
Methods:
We have developed a web-based integrated solution that combines REDCap as an electronic data capture system with the trusted third party software tools of the University Medicine Greifswald, which provides study personnel with a single user interface for both clinical data entry and management of identities, pseudonyms and informed consents.
Results:
Integration of the two platforms enables a seamless workflow of registering new participants, entering identifying and consent information, and generating pseudonyms in the trusted third party system, with subsequent capturing of medical data in the electronic data capture system, while maintaining strict separation of medical and identifying data in the two independently managed systems.
Conclusion:
Our solution enables a time-efficient data entry workflow, provides a high level of data protection by minimizing visibility of identifying information and pseudonym lists, and avoids errors introduced by manual transfer of pseudonyms between separate systems.
Introduction:
With the establishment of the Data Sharing Framework (DSF) as a distributed business process engine in German research networks, it is becoming increasingly important to coordinate authentication, authorization, and role information between peer-to-peer network components. This information is provided in the form of an allowlist. This paper presents a concept and implementation of an Allowlist Management Application.
State of the Art:
In research networks using the DSF, allowlists were initially generated manually.
Concept:
The Allowlist Management Application provides comprehensive tool support for the participating organizations and the administrators of the Allowlist Management Application. It automates the process of creating and distributing allowlists and additionally reduces errors associated with manual entries. In addition, security is improved through extensive validation of entries and enforcing review of requested changes by implementing a four-eyes principle.
Implementation:
Our implementation serves as a preliminary development for the complete automation of onboarding and allowlist management processes using established frontend and backend frameworks. The application has been deployed in the Medical Informatics Initiative and the Network University Medicine with over 40 participating organizations.
Lessons learned:
We learned the need for user guidance, unstructured communication in a structured tool, generalizability, and checks to ensure that the tool’s outputs have actually been applied.
Introduction:
The configuration of electronic data capture (EDC) systems has a relevant impact on data quality in studies and patient registries. The objective was to develop a method to visualise the configuration of an EDC system to check the completeness and correctness of the data definition and rules.
Methods:
Step 1: transformation of the EDC data model into a graphical model, step 2: Checking the completeness and consistency of the data model, step 3: correction of identified findings. This process model was evaluated on the patient registry EpiReg.
Results:
Using the graphical visualisation as a basis, 21 problems in the EDC configuration were identified, discussed with an interdisciplinary team, and corrected.
Conclusion:
The tested methodological approach enables an improvement in data quality by optimising the underlying EDC configuration.
Introduction:
Trial recruitment is a crucial factor for precision oncology, potentially improving patient outcomes and generating new scientific evidence. To identify suitable, biomarker-based trials for patients’ clinicians need to screen multiple clinical trial registries which lack support for modern trial designs and offer only limited options to filter for in- and exclusion criteria. Several registries provide trial information but are limited regarding factors like timeliness, quality of information and capability for semantic, terminology enhanced searching for aspects like specific inclusion criteria.
Methods:
We specified a Fast Healthcare Interoperable Resources (FHIR) Implementation Guide (IG) to represent clinical trials and their meta data. We embedded it into a community driven approach to maintain clinical trial data, which is fed by openly available data sources and later annotated by platform users. A governance model was developed to manage community contributions and responsibilities.
Results:
We implemented Community Annotated Trial Search (CATS), an interactive platform for clinical trials for the scientific community with an open and interoperable information model. It provides a base to collaboratively annotate clinical trials and serves as a comprehensive information source for community members. Its terminology driven annotations are coined towards precision oncology, but its principles can be transferred to other contexts.
Conclusion:
It is possible to use the FHIR standard and an open-source information model represented in our IG to build an open, interoperable clinical trial register. Advanced features like user suggestions and audit trails of individual resource fields could be represented by extending the FHIR standard. CATS is the first implementation of an open-for-collaboration clinical trial registry with modern oncological trial designs and machine-to-machine communication in mind and its methodology could be extended to other medical fields besides precision oncology. Due to its well-defined interfaces, it has the potential to provide automated patient recruitment decision support for precision oncology trials in digital applications.
Introduction:
NFDI4Health is a consortium funded by the German Research Foundation to make structured health data findable and accessible internationally according to the FAIR principles. Its goal is bringing data users and Data Holding Organizations (DHOs) together. It mainly considers DHOs conducting epidemiological and public health studies or clinical trials.
Methods:
Local data hubs (LDH) are provided for such DHOs to connect decentralized local research data management within their organizations with the option of publishing shareable metadata via centralized NFDI4Health services such as the German central Health Study Hub. The LDH platform is based on FAIRDOM SEEK and provides a complete and flexible, locally controlled data and information management platform for health research data. A tailored NFDI4Health metadata schema for studies and their corresponding resources has been developed which is fully supported by the LDH software, e.g. for metadata transfer to other NFDI4Health services.
Results:
The SEEK platform has been technically enhanced to support extended metadata structures tailored to the needs of the user communities in addition to the existing metadata structuring of SEEK.
Conclusion:
With the LDH and the MDS, the NFDI4Health provides all DHOs with a standardized and free and open source research data management platform for the FAIR exchange of structured health data.
Introduction:
Conducting clinical studies is an integral part of the clinical research repertoire of university hospitals. A wealth of organizational competences must always be available in a central location and in an up-to-date form for appropriate administration. Information such as the number of ongoing studies, and the number of enrolled participants is required for tasks related to e.g. sponsor quality management and KPIs. A registry for clinical trials can answer these questions and enhance the exchange of information.
Methods:
Requirements for an in-house registry for clinical trials were defined in a multidisciplinary task force. The requirements included interfaces and key abilities to create customized reports to fulfill the obligation to provide evidence.
Results:
The study registry is productive since May 2020 and internal interfaces have been implemented to ensure consistency between systems and the documented studies. Manually recorded data is enhanced by interfaces to primary registers. The comprehensive data set in the study register enables the creation of individual queries at any time for a variety of questions.
Discussion:
The UKSH study register has already demonstrated its usefulness in various applications and several projects. The extensive data set and the modular realization allows many current and future requirements to be met.
Introduction:
The German Central Health Study Hub is a service that was initially developed at short notice during the COVID-19 pandemic. Since then, it has been expanded in scope, content, active users and functionality. The service is aimed at two main audiences: data provider and data consumers. The former want to share research data from clinical, public health and epidemiological studies and related documents according to the FAIR criteria for research data, and the latter want to find and ultimately reuse relevant research data in the above areas.
Methods:
The service connects both groups via graphical and programmatic interfaces. A sophisticated information model is employed to describe and publish various research data objects while obeying data protection and fulfilling FAIR requirements. The service is being developed in a demand-driven manner with extensive user interaction.
Results:
A free-to-use service, built on open-source software (Dataverse, MICA, Keycloak), accessible via a web-browser. In close collaboration with users several features (ranging from collection to group items to combined data capture via API and UI) were created. The adoption of the service increases continuously and results in over 1,970 research data objects in June 2024.
Conclusion:
The service fills a marked gap and connects both user groups, yet it still needs to be improved in various dimensions (features, content, usage). The impact on the community needs to be further assessed. Despite recent legislative changes (GDNG, EHDS), the system improves the findability of sensitive data, provides a blueprint for similar systems and shows how to create a useful and user-friendly service together with users.
Introduction:
Seamless interoperability of ophthalmic clinical data is beneficial for improving patient care and advancing research through the integration of data from various sources. Such consolidation increases the amount of data available, leading to more robust statistical analyses, and improving the accuracy and reliability of artificial intelligence models. However, the lack of consistent, harmonized data formats and meanings (syntactic and semantic interoperability) poses a significant challenge in sharing ophthalmic data.
Methods:
The Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR), a standard for the exchange of healthcare data, emerges as a promising solution. To facilitate cross-site data exchange in research, the German Medical Informatics Initiative (MII) has developed a core data set (CDS) based on FHIR.
Results:
This work investigates the suitability of the MII CDS specifications for exchanging ophthalmic clinical data necessary to train and validate a specific machine learning model designed for predicting visual acuity. In interdisciplinary collaborations, we identified and categorized the required ophthalmic clinical data and explored the possibility of its mapping to FHIR using the MII CDS specifications.
Discussion:
We found that the current FHIR MII CDS specifications do not completely accommodate the ophthalmic clinical data we investigated, indicating that the creation of an extension module is essential.
Introduction:
The reuse of clinical data from clinical routine is a topic of research within the field of medical informatics under the term secondary use. In order to ensure the correct use and interpretation of data, there is a need for context information of data collection and a general understanding of the data. The use of metadata as an effective method of defining and maintaining context is well-established, particularly in the field of clinical trials. The objectives of this paper is to examine a method for integrating routine clinical data using metadata.
Methods:
To this end, clinical forms extracted from a hospital information system will be converted into the FHIR format. A particular focus is placed on the consistent use of a metadata repository (MDR).
Results:
A metadata-based approach using an MDR system was developed to simplify data integration and mapping of structured forms into FHIR resources, while offering many advantages in terms of flexibility and data quality. This facilitated the management and configuration of logic and definitions in one place, enabling the reusability and secondary use of data.
Discussion:
This work allows the transfer of data elements without loss of detail and simplifies integration with target formats. The approach is adaptable for other ETL processes and eliminates the need for formatting concerns in the target profile.
Introduction:
For an interoperable Intelligent Tutoring System (ITS), we used resources from Fast Healthcare Interoperability Resources (FHIR) and mapped learning content with Unified Medical Language System (UMLS) codes to enhance healthcare education. This study addresses the need to enhance the interoperability and effectiveness of ITS in healthcare education.
State of the art:
The current state of the art in ITS involves advanced personalized learning and adaptability techniques, integrating technologies such as machine learning to personalize the learning experience and to create systems that dynamically respond to individual learner needs. However, existing ITS architectures face challenges related to interoperability and integration with healthcare systems.
Concept:
Our system maps learning content with UMLS codes, each scored for similarity, ensuring consistency and extensibility. FHIR is used to standardize the exchange of medical information and learning content.
Implementation:
Implemented as a microservice architecture, the system uses a recommender to request FHIR resources, provide questions, and measure learner progress.
Lessons learned:
Using international standards, our ITS ensures reproducibility and extensibility, enhancing interoperability and integration with existing platforms.
Introduction:
16 million German-language free-text laboratory test results are the basis of the daily diagnostic routine of 17 laboratories within the University Hospital Erlangen. As part of the Medical Informatics Initiative, the local data integration centre is responsible for the accessibility of routine care data for medical research. Following the core data set, international interoperability standards such as FHIR and the English-language medical terminology SNOMED CT are used to create harmonised data. To represent each non-numeric laboratory test result within the base module profile ObservationLab, the need for a map and supporting tooling arose.
State of the Art:
Due to the requirement of a n:n map and a data safety-compliant local instance, publicly available tools (e.g., SNAP2SNOMED) were insufficient.
Concept and Implementation:
Therefore, we developed (1) an incremental mapping-validation process with different iteration cycles and (2) a customised mapping tool via Microsoft Access. Time, labour, and cost efficiency played a decisive role. First iterations were used to define requirements (e.g., multiple user access).
Lessons Learned:
The successful process and tool implementation and the described lessons learned (e.g., cheat sheet) will assist other German hospitals in creating local maps for inter-consortia data exchange and research. In the future, qualitative and quantitative analysis results will be published.
Introduction:
The German Medical Text Project (GeMTeX) is one of the largest infrastructure efforts targeting German-language clinical documents. We here introduce the architecture of the de-identification pipeline of GeMTeX.
Methods:
This pipeline comprises the export of raw clinical documents from the local hospital information system, the import into the annotation platform INCEpTION, fully automatic pre-tagging with protected health information (PHI) items by the Averbis Health Discovery pipeline, a manual curation step of these pre-annotated data, and, finally, the automatic replacement of PHI items with type-conformant substitutes. This design was implemented in a pilot study involving six annotators and two curators each at the Data Integration Centers of the University Hospitals Leipzig and Erlangen.
Results:
As a proof of concept, the publicly available Graz Synthetic Text Clinical Corpus (GRASSCO) was enhanced with PHI annotations in an annotation campaign for which reasonable inter-annotator agreement values of Krippendorff’s α ≈ 0.97 can be reported.
Conclusion:
These curated 1.4 K PHI annotations are released as open-source data constituting the first publicly available German clinical language text corpus with PHI metadata.