
Ebook: Applying the FAIR Principles to Accelerate Health Research in Europe in the Post COVID-19 Era

Medical Informatics has increasingly come into focus in the last couple of years, as the importance of managing and interpreting health data in dealing with a global pandemic has become dramatically apparent.
This book presents the proceedings of the 2021 European Federation for Medical Informatics (EFMI) Special Topic Conference (STC), originally planned as a live event in Seville, Spain, but ultimately held as a virtual event from 22–24 November 2021. This conference focused on applying the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) to accelerate health research in Europe in the post COVID-19 era. The 38 papers included here are divided into 5 sections, and topics covered include: methods for the adoption of FAIR principles; FAIR-based precision medicine; AI in FAIR data-driven health; privacy and security aspects of applying FAIR in health research; FAIR and infectious-disease research data (including COVID-19); FAIR in infrastructures and software; metadata, ontologies, and terminologies to support the sharing of health research data; and paradigms for sharing health research data.
Offering a state-of-the-art overview of medical informatics in the post-COVID era, the book will be of interest to all those working in the field.
This volume presents the proceedings of the 2021 EFMI Special Topic Conference (STC) organized in November 2021 as a virtual conference. This conference focuses on applying the FAIR principles to accelerate health research in Europe in the post COVID-19 era. The conference invited paper submissions, in particular those related to the following topics:
-
Methods for the adoption of FAIR principles
-
FAIR-based precision medicine
-
Artificial Intelligence in FAIR-data driven health
-
Privacy and security aspects of applying FAIR in health research
-
FAIR and Covid-19 (and other infectious diseases) research data
-
FAIR for infrastructures and software
-
Metadata, ontologies and terminologies to support the sharing of health research data
-
Paradigms for sharing health research data.
All the papers in this book of proceedings received the highest marks in the peer review process, and the volume is organized into several sections. The most popular tracks among the authors were those on Metadata, Methods and Artificial Intelligence, and cover a wide area of applications. The remaining papers fall into the categories of Data and Experiences. As expected, many papers focus on FAIR and COVID-19.
STC 2021 was initially planned as a face-to-face event, to be held in Seville, Spain, and organized by the IBiS (Institute of Biomedicine of Seville) and the SEIS (“Sociedad Española de Informática de la Salud”), the Spanish representative in EFMI. However, due to the situation and travel restrictions with regard to the Covid pandemic, the conference was conducted online.
The Scientific Program Committee (SPC) included representatives from a number of EFMI Working Groups and the EFMI Board, as well as independent experts. The SPC consisted of the following: Jaime Delgado (Chair), Arriel Benis, Paula de Toledo, Parisis Gallos, Mauro Giacomini, Alicia Martínez-García and Dario Salvi.
On behalf of the Scientific Program Committee, I would first like to warmly thank all the authors who submitted their papers to the conference. Many thanks are also due to the reviewers, whose voluntary work contributed to the quality of the conference, not forgetting the scientific program committee itself for putting the whole conference together through its meetings and individual work.
Jaime Delgado
Chair of Scientific Programme Committee
October 2021
Federated learning has a great potential to create solutions working over different sources without data transfer. However current federated methods are not explainable nor auditable. In this paper we propose a Federated data mining method to discover association rules. More accurately, we define what we consider as interesting itemsets and propose an algorithm to obtain them. This approach facilitates the interoperability and reusability, and it is based on the accessibility to data. These properties are quite aligned with the FAIR principles.
There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient’s data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.
Medical image classification and diagnosis based on machine learning has made significant achievements and gradually penetrated the healthcare industry. However, medical data characteristics such as relatively small datasets for rare diseases or imbalance in class distribution for rare conditions significantly restrains their adoption and reuse. Imbalanced datasets lead to difficulties in learning and obtaining accurate predictive models. This paper follows the FAIR paradigm and proposes a technique for the alignment of class distribution, which enables improving image classification performance in imbalanced data and ensuring data reuse. The experiments on the acne disease dataset support that the proposed framework outperforms the baselines and enable to achieve up to 5% improvement in image classification.
We present a user acceptance study of a clinical decision support system (CDSS) for Type 2 Diabetes Mellitus (T2DM) risk prediction. We focus on how a combination of data-driven and rule-based models influence the efficiency and acceptance by doctors. To evaluate the perceived usefulness, we randomly generated CDSS output in three different settings: Data-driven (DD) model output; DD model with a presence of known risk scale (FINDRISK); DD model with presence of risk scale and explanation of DD model. For each case, a physician was asked to answer 3 questions: if a doctor agrees with the result, if a doctor understands it, if the result is useful for the practice. We employed a Lankton’s model to evaluate the user acceptance of the clinical decision support system. Our analysis has proved that without the presence of scales, a physician trust CDSS blindly. From the answers, we can conclude that interpretability plays an important role in accepting a CDSS.
Recombinant human growth hormone (r-hGH) is an established therapy for growth hormone deficiency (GHD); yet, some patients fail to achieve their full height potential, with poor adherence and persistence with the prescribed regimen often a contributing factor. A data-driven clinical decision support system based on “traffic light” visualizations for adherence risk management of patients receiving r-hGH treatment was developed. This research was feasible thanks to data-sharing agreements that allowed the creation of these models using real-world data of r-hGH adherence from easypod™ connect; data was retrieved for 11,015 children receiving r-hGH therapy for ≥180 days. Patients’ adherence to therapy was represented using four values (mean and standard deviation [SD] of daily adherence and hours to next injection). Cluster analysis was used to categorize adherence patterns using a Gaussian mixture model. Following a traffic lights-inspired visualization approach, the algorithm was set to generate three clusters: green, yellow, or red status, corresponding to high, medium, and low adherence, respectively. The area under the receiver operating characteristic curve (AUC-ROC) was used to find optimum thresholds for independent traffic lights according to each metric. The most appropriate traffic light used the SD of the hours to the next injection, with an AUC-ROC value of 0.85 when compared to the complex clustering algorithm. For the daily adherence-based traffic lights, optimum thresholds were >0.82 (SD, <0.37), 0.53–0.82 (SD, 0.37–0.61), and <0.53 (SD, >0.61) for high, medium, and low adherence, respectively. For hours to next injection, the corresponding optimum thresholds were <27.18 (SD, <10.06), 27.18–34.01 (SD, 10.06–29.63), and >34.01 (SD, >29.63). Our research indicates that implementation of a practical data-driven alert system based on recognised traffic-light coding would enable healthcare practitioners to monitor sub-optimally-adherent patients to r-hGH treatment for early intervention to improve treatment outcomes.
The FAIR Principles are supported by various initiatives in the biomedical community. However, little is known about the knowledge and efforts of individual clinical researchers regarding data FAIRification. We distributed an online questionnaire to researchers from six Dutch University Medical Centers, as well as researchers using an Electronic Data Capture platform, to gain insight into their understanding of and experience with data FAIRification. 164 researchers completed the questionnaire. 64.0% of them had heard of the FAIR Principles. 62.8% of the researchers spent some or a lot of effort to achieve any aspect of FAIR and 11.0% addressed all aspects. Most researchers were unaware of the Principles’ emphasis on both human- and machine-readability, as their FAIRification efforts were primarily focused on achieving human-readability (93.9%), rather than machine-readability (31.2%). In order to make machine-readable, FAIR data a reality, researchers require proper training, support, and tools to help them understand the importance of data FAIRification and guide them through the FAIRification process.
Implementing the best research principles initiates an important shift in clinical research culture, improving efficiency and the level of evidence obtained. In this article, we share our own view on the best research practice and our experience introducing it into the scientific activities of the N.N. Burdenko National Medical Research Center of Neurosurgery (Moscow, Russian Federation). While being adherent to the principles described in the article, the percentage of publications in the international scientific journals in our Center has increased from 7% to 27%, with an overall gain in the number of articles by 2 times since 2014. We believe it is important that medical informatics professionals equally to medical experts involved in clinical research are familiar with the best research principles.
Hip arthroplasty represents a large proportion of orthopaedic activity, constantly increasing. Automating monitoring from clinical data warehouses is an opportunity to dynamically monitor devices and patient outcomes allowing improve clinical practices. Our objective was to assess quantitative and qualitative concordance between claim data and device supply data in order to create an e-cohort of patients undergoing a hip replacement.
We performed a single-centre cohort pilot study, from one clinical data warehouse of a French University Hospital, from January 1, 2010 to December 31, 2019. We included all adult patients undergoing a hip arthroplasty, and with at least one hip medical device provided. Patients younger than 18 years or opposed to the reuse of their data were excluded from the analysis. Our primary outcome was the percentage of hospital stays with both hip arthroplasty and hip device provided. The patient and stay characteristics assessed in this study were: age, sex, length of stay, surgery procedure (replacement, repositioning, change, or reconstruction), medical motif for surgery (osteoarthritis, fracture, cancer, infection, or other) and device provided (head, stem, shell, or other).
We found 3,380 stays and 2,934 patients, 96.4% of them had both a hip surgery procedure and a hip device provided. These data from different sources are close enough to be integrated in a common clinical data warehouse.
To handle genomic information while supporting FAIR principles, we present GIPAMS, a modular architecture. GIPAMS provides security and privacy to manage genomic information by means of several independent services and modules that interact among them in an orchestrated way. The paper analyzes how some security and privacy aspects of the FAIRification process are covered by the GIPAMS platform.
The important information about a patient is often stored in a free-form text to describe the events in the patient’s medical history. In this work, we propose and evaluate a hybrid approach based on rules and syntactical analysis to normalise temporal expressions and assess uncertainty depending on the remoteness of the event. A dataset of 500 sentences was manually labelled to measure the accuracy. On this dataset, the accuracy of extracting temporal expressions is 95,5%, and the accuracy of normalization is 94%. The event extraction accuracy is 74.80%. The essential advantage of this work is the implementation of the considered approach for the non-English language where NLP tools are limited.
The One Digital Health framework aims at transforming future health ecosystems and guiding the implementation of a digital technologies-based systemic approach to caring for humans’ and animals’ health in a managed surrounding environment. To integrate and to use the data generated by the ODH data sources, “FAIRness” stands as a prerequisite for proper data management and stewardship.
Generating evidence based on real-world data is gaining importance in research not least since the COVID-19 pandemic. The Common Data Model of Observational Medical Outcomes Partnership (OMOP) is a research infrastructure that implements FAIR principles. Although the transfer of German claim data to OMOP is already implemented, drug data is an open issue. This paper provides a concept to prepare electronic health record (EHR) drug data for the transfer to OMOP based on requirements analysis and descriptive statistics for profiling EHR data developed by an interdisciplinary team and also covers data quality issues. The concept not only ensures FAIR principles for research, but provides the foundation for German drug data to OMOP transfer.
Different datasets have been deployed at national level to share data on COVID-19 already at the beginning of the epidemic spread in early 2020. They distribute daily updated information aggregated at local, gender and age levels. To facilitate the reuse of such data, FAIR principles should be applied to optimally find, access, understand and exchange them, to define intra- and inter-country analyses for different purposes, such as statistical. However, another aspect to be considered when analyzing these datasets is data quality. In this paper we link these two perspectives to analyze to what extent datasets published by national institutions to monitor diffusion of COVID-19 are reusable for scientific purposes, such as tracing the spread of the virus.
Adopting international standards within health research communities can elevate data FAIRness and widen analysis possibilities. The purpose of this study was to evaluate the mapping feasibility against HL7® Fast Healthcare Interoperability Resources® (FHIR)® of a generic metadata schema (MDS) created for a central search hub gathering COVID-19 health research (studies, questionnaires, documents = MDS resource types). Mapping results were rated by calculating the percentage of FHIR coverage. Among 86 items to map, total mapping coverage was 94%: 50 (58%) of the items were available as standard resources in FHIR and 31 (36%) could be mapped using extensions. Five items (6%) could not be mapped to FHIR. Analyzing each MDS resource type, there was a total mapping coverage of 93% for studies and 95% for questionnaires and documents, with 61% of the MDS items available as standard resources in FHIR for studies, 57% for questionnaires and 52% for documents. Extensions in studies, questionnaires and documents were used in 32%, 38% and 43% of items, respectively. This work shows that FHIR can be used as a standardized format in registries for clinical, epidemiological and public health research. However, further adjustments to the initial MDS are recommended – and two additional items even needed when implementing FHIR. Developing a MDS based on the FHIR standard could be a future approach to reduce data ambiguity and foster interoperability.
The German Central Health Study Hub COVID-19 is an online service that offers bundled access to COVID-19 related studies conducted in Germany. It combines metadata and other information of epidemiologic, public health and clinical studies into a single data repository for FAIR data access. In addition to study characteristics the system also allows easy access to study documents, as well as instruments for data collection. Study metadata and survey instruments are decomposed into individual data items and semantically enriched to ease the findability. Data from existing clinical trial registries (DRKS, clinicaltrails.gov and WHO ICTRP) are merged with epidemiological and public health studies manually collected and entered. More than 850 studies are listed as of September 2021.