Ebook: MEDINFO 2021: One World, One Health – Global Partnership for Digital Innovation
The World Health Organization defines health as “a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity”, and its constitution also asserts that health for all people is “dependent on the fullest co-operation of individuals and States”. The ongoing pandemic has highlighted the power of both healthy and unhealthy information, so while healthcare and public health services have depended upon timely and accurate data and continually updated knowledge, social media has shown how unhealthy misinformation can be spread and amplified, reinforcing existing prejudices, conspiracy theories and political biases.
This book presents the proceedings of MedInfo 2021, the 18th World Congress of Medical and Health Informatics, held as a virtual event from 2-4 October 2021, with pre-recorded presentations for all accepted submissions. The theme of the conference was One World, One Health – Global Partnership for Digital Innovation and submissions were requested under 5 themes: information and knowledge management; quality, safety and outcomes; health data science; human, organizational and social aspects; and global health informatics. The Programme Committee received 352 submissions from 41 countries across all IMIA regions, and 147 full papers, 60 student papers and 79 posters were accepted for presentation after review and are included in these proceedings.
Providing an overview of current work in the field over a wide range of disciplines, the book will be of interest to all those whose work involves some aspect of medical or health informatics.
Why “one world, one health”?
The Constitution of the World Health Organization defines health as “a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity”. The Constitution also asserts that health for all peoples is “dependent on the fullest co-operation of individuals and States” – hence our “one world” theme.
But human health is part of a bigger picture. The “one health” concept recognizes the interconnected global ecosystem of our planet. Three examples highlight this key principle. Firstly, recent events have reminded us of the threats of zoonotic diseases, where pathogens pass from animal to human populations. Secondly, human interaction with microbes. This can be seen both in a positive way such as the vital symbiosis of the human microbiome, and in a negative way with risks like antimicrobial resistance due to overuse of antibiotic drugs. Thirdly, and perhaps most obviously, the continuing devastation of the environment by human stupidity and greed threatens the health of the one fragile world we share.
The pandemic has dramatically emphasised the power of healthy and unhealthy information. Healthcare and public health services have depended upon both timely and accurate data and continually updated knowledge to organize and deliver treatment, prevention and policy advice. Unparalleled global scientific cooperation has demonstrated what can be done when information and methods are rapidly shared and scrutinized. Unfortunately, social media has shown how unhealthy misinformation can be spread and amplified. This has reinforced existing prejudices, conspiracy theories and political biases to sustain and justify spurious beliefs and selfish behaviour like pandemic denial, vaccine rejection and mask refusal.
The worldwide community of the International Medical Informatics Association continues to work as a global partnership for healthy information and digital innovation. The 18th World Congress of Medical and Health Informatics, MedInfo 2021, was held as a virtual event from 2–4 October, with pre-recorded presentations for all accepted submissions and six live online sessions for the invited panels, keynotes and awards.
The MedInfo 2021 Scientific Programme Committee (SPC) called for submissions under five themes:
1. Information and Knowledge Management
2. Quality, Safety and Outcomes
3. Health Data Science
4. Human, Organizational and Social Aspects
5. Global health informatics
We received 352 submissions from 41 countries across all IMIA regions. Peer review was organized by the SPC co-chairs and eight track chairs and co-chairs, involving over 100 reviewers. Finally, 147 full papers, 60 student papers and 79 posters were accepted and are included in these proceedings.
The live online sessions of MedInfo 2021 included six invited panels and awards for best paper, best student paper and the François Grémy Award of Excellence.
The SPC would like to thank the track chairs and co-chairs, the reviewers, the editorial assistants and the Chair and CEO of IMIA for their invaluable contribution to the success of this first virtual MedInfo conference, prepared and held during a time of unprecedented global disruption.
Philip Scott and Paula Otero, MedInfo 2021 SPC co-chairs
Clinical researchers hold high expectations for the utility of health data sourced from hospital information systems. In Japan, the standardized structured medical information eXchange version 2 (SS-MIX2) storage is a common resource for obtaining clinical data from different medical databases. However, little is known about the coverage of the data types derived from the SS-MIX2 storage. In this regard, we calculated the proportions of a dataset that could be extracted via SS-MIX2 for various clinical study categories listed in various articles published in the New England Journal of Medicine. In the 95 articles reviewed, the proportions varied from 13.3% ± 13.3% (mean ± SD) for dementia to 61.8% ± 13.7% for diabetes. For cardiology, the proportion of data accessed in a unique format (SEAMAT) increased significantly. We further noted that there was room for improvement in the coverage of SS-MIX2 data.
Clinical Pathways (CP) provide healthcare personnel with an easy-to-understand high level model of medical steps in specific patient conditions, thereby improving overall process quality in clinical practice. The emergence of new clinical-oriented standards such as openEHR Task Planning (TP) could pose a major step towards clinical process improvement, particularly in complex domains such as infection diagnosis and treatment, where time plays a critical role. In this work, we analyze the suitability of TP to successfully represent time constraints of common process patterns in infections, modelling some of the Catheter-Related Blood Stream Infection (CR-BSI) process patterns as a case study. Our research shows that TP is useful to represent time constraints of infection CPs, although minor improvements could increase its suitability not only for infection processes but for other time-related complex clinical scenarios.
Measurement concepts are essential to observational healthcare research; however, a lack of concept harmonization limits the quality of research that can be done on multisite research networks. We developed five methods that used a combination of automated, semi-automated and manual approaches for generating measurement concept sets. We validated our concept sets by calculating their frequencies in cohorts from the Columbia University Irving Medical Center (CUIMC) database. For heart transplant patients, the preoperative frequencies of basic metabolic panel concept sets, which we generated by a semi-automated approach, were greater than 99%. We also made concept sets for lumbar puncture and coagulation panels, by automated and manual methods respectively.
In Norway there is an overall goal to establish a national digitalization platform for primary healthcare named Akson to improve information exchange. We participated in the work with Akson and through qualitative research including interviews, we found that the project could benefit from other similar infrastructuring processes. First, a national process of defining clinical standards and establishing a governance organization to handle them. Second, improving data exchange between an EHR system and a national quality registry. The aim of the paper is to outline some lessons learned from these previous processes, for Akson and similar large-scale projects focusing on how to govern the digitalization platform at different healthcare levels and how to reuse healthcare information within and across healthcare institutions. Hence, we ask the following research question: Which experiences from previous large-scale infrastructuring processes should be considered when establishing a national digitalization platform for sharing data?
Medical data science aims to facilitate knowledge discovery assisting in data, algorithms, and results analysis. The FAIR principles aim to guide scientific data management and stewardship, and are relevant to all digital health ecosystem stakeholders. The FAIR4Health project aims to facilitate and encourage the health research community to reuse datasets derived from publicly funded research initiatives using the FAIR principles. The ‘FAIRness for FHIR’ project aims to provide guidance on how HL7 FHIR could be utilized as a common data model to support the health datasets FAIRification process. This first expected result is an HL7 FHIR Implementation Guide (IG) called FHIR4FAIR, covering how FHIR can be used to cover FAIRification in different scenarios. This IG aims to provide practical underpinnings for the FAIR4Health FAIRification workflow as a domain-specific extension of the GoFAIR process, while simplifying curation, advancing interoperability, and providing insights into a roadmap for health datasets FAIR certification.
Clinical image data analysis is an active area of research. Integrating such data in a Clinical Data Warehouse (CDW) implies to unlock the PACS and RIS and to address interoperability and semantics issues. Based on specific functional and technical requirements, our goal was to propose a web service (I4DW) that allows users to query and access pixel data from a CDW by fully integrating and indexing imaging metadata. Here, we present the technical implementation of this workflow as well as the evaluation we carried out using a prostate cancer cohort use case. The query mechanism relies on a Dicom metadata hierarchy dynamically generated during the ETL Process. We evaluated the Dicom data transfer performance of I4DW, and found mean retrieval times of 5.94 seconds and 0.9 seconds to retrieve a complete DICOM series from the PACS and all metadata of a series. We could retrieve all patients and imaging tests of the prostate cancer cohort with a precision of 0.95 and a recall of 1. By leveraging the CMOVE method, our approach based on the Dicom protocol is scalable and domain-neutral. Future improvement will focus on performance optimization and de identification.
A significant portion of data in Electronic Health Records is only available as unstructured text, such as surgical or finding reports, clinical notes and discharge summaries. To use this data for secondary purposes, natural language processing (NLP) tools are required to extract structured information. Furthermore, for interoperable use, harmonization of the data is necessary. HL7 Fast Healthcare Interoperability Resources (FHIR), an emerging standard for exchanging healthcare data, defines such a structured format. For German-language medical NLP, the tool Averbis Health Discovery (AHD) represents a comprehensive solution. AHD offers a proprietary REST interface for text analysis pipelines. To build a bridge between FHIR and this interface, we created a service that translates the communication around AHD from and to FHIR. The application is available under an open source license.
Although FHIR has been designed to be easy to implement, it requires knowledge that is still hard to find. We aim to evaluate the use of FHIR in Portuguese projects for the integration of medical devices. Two projects were selected, including easyHealth4Covid (EH4C) and Chronic Diseases Management Platform (CDMP). The evolution of each project and the FHIR resources used were analyzed. 11 different sensors of 5 companies were used in the sum of both projects. Previously, none of them used FHIR to integrate and the teams had little to no experience in doing so. The FHIR Observation resource was used for all. There is a general lack of knowledge of the FHIR standard and terminologies of most of the device companies involved in the projects.
The objective of this study was to develop a hybrid method and perform an initial evaluation of mappings from the International Statistical Classification of Diseases, 10th revision, Chinese version (ICD-10-CN) to the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED-CT). The methods used to perform mapping include reusing existing mappings, term similarity modeling for automatic mapping and manual review. We evaluated the results of automatic mapping and the coverage of the maps between two terminologies. Experimental results demonstrated that fine-tuning the pre-trained biomedical language model of PubmedBERT obtained the optimal performance, with a precision of 0.859, a recall of 0.773, and a F1 of 0.814. 100% 4-digit code ICD-10-CN terms were mapped to SNOMED-CT terms through exsit code mappings. Around 42.41% randomly selected 6-digit code ICD-10-CN terms had exact matches to corresponding SNOMED-CT terms, and we did not find appropriate SNOMED-CT terms for ICD grouping terms.
Data sharing and interoperability between jail systems and community health providers are critical for successful re-entry of incarcerated individuals into the mainstream community. Using a case study approach, we present an account of interoperability efforts between jail and community health systems in the County of Orange (California, USA), including the overall infrastructure comprising of the jail management system, jail health system, and the community health system. We also describe outcomes and lessons from the Jail to Community Re-entry Program implemented in the County of Orange, along with recommendations and common data elements required for effective care transitions from custody to community
Several open source components have been made available in recent years to help develop full openEHR systems. Still doubts exist if these are sufficient. This paper presents a case study of implementing a low-code openEHR system, investigating the feasibility and challenges of developing a system using these components for each step. The method used consisted in selecting successful examples of implementation case studies, identifying key development steps, and for each step searching for possible open source options. As a result, we had a working low-code openEHR powered EHR, successfully demonstrating the feasibility of the proposed implementation guide. The main available free or open source components used were ArchetypeDesigner and EHRbase, developed by Better and Vita/HighMed respectively. In our opinion, it is possible to build EHR systems using the available open source components, but support is still missing in the front end, specifically for form generation and screen representation.
Primary Immunodeficiencies (PIDs) are associated with more than 400 rare monogenic diseases affecting various biological functions (e.g., development, regulation of the immune response) with a heterogeneous clinical expression (from no symptom to severe manifestations). To better understand PIDs, the ATRACTion project aims to perform a multi-omics analysis of PIDs cases versus a control group patients, including single-cell transcriptomics, epigenetics, proteomics, metabolomics, metagenomics and lipidomics. In this study, our goal is to develop a common data model integrating clinical and omics data, which can be used to obtain standardized information necessary for characterization of PIDs patients and for further systematic analysis. For that purpose, we extend the OMOP Common Data Model (CDM) and propose a multi-omics ATRACTion OMOP-CDM to integrate multi-omics data. This model, available for the community, is customizable for other types of rare diseases (https://framagit.org/imagine-plateforme-bdd/pub-rhu4-atraction).
Research data management requires stable, trustworthy repositories to safeguard scientific research results. In this context, rich markup with metadata is crucial for the discoverability and interpretability of the relevant resources. SEEK is a web-based software to manage all important artifacts of a research project, including project structures, involved actors, documents and datasets. SEEK is organized along the ISA model (Investigation – Study – Assay). It offers several machine-readable serializations, including JSON and RDF. In this paper, we extend the power of RDF serialization by leveraging the W3C Data Catalog Vocabulary (DCAT). DCAT was specifically designed to improve interoperability between digital assets on the Web and enables cross-domain markup. By using community-consented gold standard vocabularies and a formal knowledge description language, findability and interoperability according to the FAIR principles are significantly improved.
Health research increasingly requires effective ways to identify existing datasets and assess their suitability for research. We sought to test whether researchers could use an existing metadata catalogue to assess the suitability of datasets for addressing specified research questions. Five datasets were described in the National Institute for Health Research Health Informatics Collaborative metadata catalogue, and for each dataset five associated research questions were formulated, some of which were answerable with the dataset while others were not. Thirteen researchers each assessed whether the ten questions associated with two randomly selected datasets were answerable with the described datasets. After removing instances where participants misunderstood the question or lacked subject matter knowledge to make the assessment, we found that 87 out of 109 assessments (80%) were correct. Participants particularly struggled with one dataset which consisted of EHR data. The most common reason for incorrect assessments was the inability to find the relevant information in the metadata catalogue.
The large variability of data models, specifications, and interpretations of data elements is particular to the healthcare domain. Achieving semantic interoperability is the first step to enable reuse of healthcare data. To ensure interoperability, metadata repositories (MDR) are increasingly used to manage data elements on a structural level, while terminology servers (TS) manage the ontologies, terminologies, coding systems and value sets on a semantic level. In practice, however, this strict separation is not always followed; instead, semantical information is stored and maintained directly in the MDR, as a link between both systems is missing. This may be reasonable up to a certain level of complexity, but it quickly reaches its limitations with increasing complexity. The goal of this approach is to combine both components in a compatible manner. We present TermiCron, a synchronization engine that provides synchronized value sets from TS in MDRs, including versioning and annotations. Prototypical results were shown for the terminology server Ontoserver and two established MDR systems. Bridging the semantic and structural gap between the two infrastructure components, this approach enables shared use of metadata and reuse of corresponding health information by establishing a clear separation of the two systems and thus serves to strengthen reuse as well as to increase quality.
The heterogeneity of electronic health records model is a major problem: it is necessary to gather data from various models for clinical research, but also for clinical decision support. The Observational Medical Outcomes Partnership – Common Data Model (OMOP-CDM) has emerged as a standard model for structuring health records populated from various other sources. This model is proposed as a relational database schema. However, in the field of decision support, formal ontologies are commonly used. In this paper, we propose a translation of OMOP-CDM into an ontology, and we explore the utility of the semantic web for structuring EHR in a clinical decision support perspective, and the use of the SPARQL language for querying health records. The resulting ontology is available online.
Waiting time for a consultation for chronic pain is a widespread health problem. This paper presents the design of an ontology use to assess patients referred to a consultation for chronic pain.
We designed OntoDol, an ontology of pain domain for patient triage based on priority degrees. Terms were extracted from clinical practice guidelines and mapped to SNOMED-CT concepts through the Python module Owlready2. Selected SNOMED-CT concepts, relationships, and the TIME ontology, were implemented in the ontology using Protégé. Decision rules were implemented with SWRL. We evaluated OntoDol on 5 virtual cases.
OntoDol contains 762 classes, 92 object properties and 18 SWRL rules to assign patients to 4 categories of priority. OntoDol was able to assert every case and classify them in the right category of priority.
Further works will extend OntoDol to other diseases and assess OntoDol with real world data from the hospital.
Clinical pathways (CP) enable a standardized and an efficient management of patients with common pathologies. As operational tools, they take into account knowledge from guidelines and from the context (e.g. availability of resources) in which different interventions are to be carried out. Mastering the coherence of interactions between all these knowledge domains is a major challenge for the implementation of CP. This scientific work led to the development of an ontology called Shareable and Reusable Clinical Pathway Ontology (ShaRE-CP) which integrates four knowledge domains (CP, guidelines, health resources and context) and to the establishment of existing semantic links between them. The consistency of this semantic model has been validated by using reasoners. This ontology can serve as a basis for the development of a decision support system for planning and managing patient care.
Chemotherapies against cancers are often interrupted due to severe drug toxicities, reducing treatment opportunities. For this reason, the detection of toxicities and their severity from EHRs is of importance for many downstream applications. However toxicity information is dispersed in various sources in the EHRs, making its extraction challenging.
We introduce OntoTox, an ontology designed to represent chemotherapy toxicities, its attributes and provenance. We illustrated the interest of OntoTox by integrating toxicities and grading information extracted from three heterogeneous sources: EHR questionnaires, semi-structured tables, and free-text.
We instantiated 53,510, 2,366 and 54,420 toxicities from questionnaires, tables and free-text respectively, and compared the complementarity and redundancy of the three sources.
We illustrated with this preliminary study the potential of OntoTox to guide the integration of multiple sources, and identified that the three sources are only moderately overlapping, stressing the need for a common representation.
ICD-11 will be used to report mortality statistics by WHO member countries starting in 2022. In the US, ICD-10-CM will likely continue to be used for morbidity coding for a long period of time. A map between ICD-10-CM and ICD-11 will therefore be useful for interoperability purpose between datasets coded with ICD-10-CM and ICD-11.
The objective of this study is to explore novel approaches to automatically derive a map between ICD-10-CM and ICD-11 through the sequential use of existing maps.
Methods and results:
Sequential mapping through ICD-10 yielded better coverage and accuracy compared to mapping through SNOMED CT.
Sequential mapping is useful in automatically creating a draft map from ICD-10-CM to ICD-11 and would reduce manual curation efforts in creating the final map. The various approaches offer different trade-offs among coverage, recall and precision.
South Korea has a public and single-payer system for healthcare services based on fee-for-service payments. The National Health Insurance (NHI) reimbursement claim codes are used by all healthcare providers for reimbursement. This study mapped NHI reimbursement claim codes for therapeutic and surgical procedures to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) to facilitate semantic interoperability and data reuse for research. The Source codes for mapping were 2,500 reimbursement claim codes for therapeutic and surgical procedures such as surgery, endoscopic procedures, and interventional radiology. The target terminology for mapping was the ‘Procedure’ hierarchy of the international edition of SNOMED-CT released in July 2019. We translated Korean terms into English, clarified their meaning, extracted characteristics of the source codes, and mapped them to pre-coordinated concepts. If a source concept was not mapped to a pre-coordinated concept, we mapped it to a post-coordinated expression. The mapping results were validated internally using dual independent mapping and group discussion by trained terminologists, and by two physicians with experience of SNOMED-CT mapping. Out of 2,500 source codes, 1,298 (51.9%) codes were mapped to pre-coordinated concepts, and 1,202 (48.1%) codes were mapped to post-coordinated expressions. The mapping of the NHI reimbursement claim codes for therapeutic and surgical procedures to SNOMED-CT is expected to support clinical research by facilitating the utilization of health insurance claim data.
The clinical data often have limited usefulness because of the diversified expression. Chinese clinical data standardization can improve the usability of clinical data. The complexity of data cleaning and coding for Chinese clinical data prompted the turn of low-effective manual coding into the computer-aided tool. This study established the universal data cleaning and coding process and tool for Chinese clinical data standardization, which can greatly improve human efficiency. The process included the preprocessing, text similarity algorithm, and manual review. The standardization process proved effective for the diagnosis, drug, and examination data standardization task and can be used gradually in other clinical domains. The semi-automatic data cleaning and coding can reduce the half time for standardization, and it was used in hospitals in Beijing.
The CDISC Controlled Terminology (CT) defines the terms that may be used to represent clinical trial data in the CDISC standards. Despite its unique importance, there has been limited systematic examination of the coverage of this terminology. In this work, we performed an assessment of the completeness of CDISC CT’s coverage by comparing clinical outcomes for multiple sclerosis (MS) available in CDISC CT with two independent high-fidelity benchmarks: (1) 71 expert-selected outcomes catalogued by the National Institute of Neurological Disorders and Stroke (NINDS), and, (2) 66 common outcomes used in MS trials registered on ClinicalTrials.gov (CTG). We employed a semi-automated search and term-mapping process to identify possible CDISC equivalents to the benchmarks’ measures. We found that 55% of the NINDS outcomes and 52% of the CTG outcomes are absent from the CDISC Terminology, indicating a need for expanding the terminology to take into account other established standards and real-world practice.