Ebook: Informatics and Technology in Clinical Care and Public Health
Data, informatics, and technology are now among the most important aspects inspiring health professionals and informaticians to improve healthcare for the benefit of patients.
This book presents the proceedings of the 19th annual International Conference on Informatics, Management, and Technology in Healthcare (ICIMTH 2021), held as a virtual event due to COVID-19 pandemic restrictions on 16 and 17 October 2021 in Athens, Greece. The ICIMTH conferences are a series of scientific events which bring together scientists working in the field of biomedical and health informatics from around the world. The 2021 conference examined the field of biomedical and health informatics in a very broad framework, presenting the research and application outcomes of informatics from cell to populations, and including a number of technologies such as imaging, sensors and biomedical equipment, as well as management and organizational aspects, including legal and social issues and the setting of research priorities in health informatics. A significant number of the papers included here relate to the COVID-19 pandemic.
Providing an insight into the latest developments in biomedical and health informatics, the book will be of interest to all those working in the field.
This volume contains accepted papers from the ICIMTH (International Conference on Informatics, Management, and Technology in Healthcare), the scientific outcomes of which the Scientific Programme Committee is pleased to present to the academic and professional community of Biomedical and Health Informatics. The conference was held virtually on 16 and 17 October 2021 in Athens, Greece.
The ICIMTH 2021 Conference is the 19th Annual Conference in this series of scientific events, which brings together scientists working in the field of Biomedical and Health Informatics from all continents.
As was also the case last year, this year’s conference was held as a virtual event by means of interactive teleconferencing platforms due to the COVID-19 pandemic and the consequent restrictions on gatherings and travel in many parts of the world.
The conference examines the field of Biomedical and Health Informatics in a very broad framework, presenting the research and application outcomes of informatics from cell to populations, and including a number of technologies, such as imaging, sensors and biomedical equipment, as well as management and organisational aspects, including legal and social issues and setting research priorities in health informatics. Essentially, data, informatics and technology inspire health professionals and informaticians to improve healthcare for the benefit of patients. As was expected this year, a significant number of papers relate to the COVID-19 pandemic.
It should be noted that these proceedings are published with open access in the Studies in Health Technology and Informatics (SHTI) series of IOS Press, with e-access for ease of use and browsing without the loss of any of the advantages of indexing and citation in the biggest scientific literature databases, such as Medline and Scopus.
By the deadline for papers, we had more than 170 submissions, of which we have accepted 120 after review as papers to be included in this volume of proceedings. This year, due to the shifting of the conference dates by a few months from the traditional time to later dates due to pandemic issues, the proceedings were not available at the time of the conference, but will be published by the end of the year.
The Editors would like to thank the Members of the Scientific Programme Committee, the Organising Committee, and all those reviewers who performed a very professional, thorough and objective refereeing of the scientific work in order to achieve this high-quality publication for a successful scientific event.
John Mantas, Arie Hasman, Mowafa S. Househ, Parisis Gallos, Emmanouil Zoulias, and Joseph Liaskos
We extracted 3,291,101 Tweets using hashtags associated with African American-related discourse (#BlackTwitter, #BlackLivesMatter, #StayWoke) and 1,382,441 Tweets from a control set (general or no hashtags) from September 1, 2019 to December 31, 2019 using the Twitter API. We also extracted a literary historical corpus of 14,692 poems and prose writings by African American authors and 66,083 items authored by others as a control, including poems, plays, short stories, novels and essays, using a cloud-based machine learning platform (Amazon SageMaker) via ProQuest TDM Studio. Lastly, we combined statistics from log likelihood and Fisher’s exact tests as well as feature analysis of a batch-trained Naive Bayes classifier to select lexicons of terms most strongly associated with the target or control texts. The resulting Tweet-derived African American lexicon contains 1,734 unigrams, while the control contains 2,266 unigrams. This initial version of a lexicon-based African American Tweet detection algorithm developed using Tweet texts will be useful to inform culturally sensitive Twitter-based social support interventions for African American dementia caregivers.
Our study aimed to compare the capability of different word embeddings to capture the semantic similarity of clinical concepts related to complications in neurosurgery at the level of medical experts. Eighty-four sets of word embeddings (based on Word2vec, GloVe, FastText, PMI, and BERT algorithms) were benchmarked in a clustering task. FastText model showed the best close to the medical expertise capability to group medical terms by their meaning (adjusted Rand index = 0.682). Word embedding models can accurately reflect clinical concepts’ semantic and linguistic similarities, promising their robust usage in medical domain-specific NLP tasks.
Tremendous changes have been witnessed in the post-COVID-19 world. Global efforts were initiated to reach a successful treatment for this emerging disease. These efforts have focused on developing vaccinations and/or finding therapeutic agents that can be used to combat the virus or reduce its accompanying symptoms. Gulf Cooperation Council (GCC) countries have initiated efforts on many clinical trials to address the efficacy and the safety of several therapeutic agents used for COVID-19 treatment. In this article, we provide an overview of the GCC’s clinical trials and associated drugs’ discovery process in the pursuit of an effective medication for COVID-19.
Artificial Intelligence (AI) has seen an increased application within digital healthcare interventions (DHIs). DHIs use entails challenges about their safety assurance. Exacerbated by regulatory requirements, in the UK, this places the onus of safety assurance not only on the manufacturer, but also on the operator of a DHI. Clinical Safety claims and evidencing safe implementation and use of AI-based DHIs require expertise, to understand and act to control or mitigate risk. Current health software standards, regulation, and guidance do not provide the insight necessary for safer implementation.
To interpret published guidance and policy related to AI and justify clinical safety assurance of DHIs.
Assessment of UK health regulation policy, standards, and AI institution insights, utilizing a published Hazard Assessment framework, to structure safety justifications, and articulate hazards relating to AI-based DHIs.
AI enabled DHI hazard identification, relating to implementation and use within healthcare delivery organizations.
By application of the method, we postulate that UK research of AI DHIs highlighted issues that may affect safety, in need of consideration to justify safety of a DHI.
Processing unstructured clinical texts is often necessary to support certain tasks in biomedicine, such as matching patients to clinical trials. Among other methods, domain-specific language models have been built to utilize free-text information. This study evaluated the performance of Bidirectional Encoder Representations from Transformers (BERT) models in assessing the similarity between clinical trial texts. We compared an unstructured aggregated summary of clinical trials reviewed at the Johns Hopkins Molecular Tumor Board with the ClinicalTrials.gov records, focusing on the titles and eligibility criteria. Seven pretrained BERT-Based models were used in our analysis. Of the six biomedical-domain-specific models, only SciBERT outperformed the original BERT model by accurately assigning higher similarity scores to matched than mismatched trials. This finding is promising and shows that BERT and, likely, other language models may support patient-trial matching.
Development of person-centred care (PCC) services require adjustment to specific domain of application and integration with existing processes implemented in healthcare institution. This poster present PCC services for monitoring stroke outpatient rehabilitation, enhanced by modern ICT technologies (thus enabling adjustments to different kind of patients, which is especially relevant due to potential consequences of the stroke and caused degree of disability).
The FAIR Guiding Principles do not address the quality of data and metadata. Therefore, data collections could be FAIR but useless. In a funding initiative of registries for health services research, trueness of data received special attention. Completeness in the definition of recall was selected to represent this dimension in a cross-registry benchmarking. The first analyses of completeness revealed a diversity of its implementation. No registry was able to present results exactly as requested in a guideline on data quality. Two registries switched to a source data verification as alternative, the three others downsized to the dimension integrity. The experiences underline that the achievement of appropriate data quality is a matter of costs and resources, whereas the current Guiding Principles quote for a transparent culture regarding data and metadata. We propose the extension to FAIR-Q, data collections should not only be findable, accessible, interoperable, and reusable, but also quality assured.
Population Health Management typically relies on subjective decisions to segment and stratify populations. This study combines unsupervised clustering for segmentation and supervised classification, personalised to clusters, for stratification. An increase in cluster homogeneity, sensitivity and positive predictive value was observed compared to an unlinked approach. This analysis demonstrates the potential for a cluster-then-predict methodology to improve and personalise decisions in healthcare systems.
The possibility of postoperative speech dysfunction prediction in neurosurgery based on intraoperative cortico-cortical evoked potentials (CCEP) might provide a new basis to refine the criteria for the extent of intracerebral tumor resection and preserve patients’ quality of life. In this study, we aimed to test the quality of predicting postoperative speech dysfunction with machine learning based on the initial intraoperative CCEP before tumor removal. CCEP data were reported for 26 patients. We used several machine learning models to predict speech deterioration following neurosurgery: a random forest of decision trees, logistic regression, support vector machine with different types of the kernel (linear, radial, and polynomial). The best result with F1-score = 0.638 was obtained by a support vector machine with a polynomial kernel. Most models showed low specificity and high sensitivity (reached 0.993 for the best model). Our pilot study demonstrated the insufficient quality of speech dysfunction prediction by solely intraoperative CCEP recorded before glial tumor resection, grounding our further research of CCEP postresectional dynamics.
The early detection and treatment of neoplasms, and in particular the malignant, can save lives. However, identifying those most at risk of developing neoplasms remains challenging. Electronic Health Records (EHR) provide a rich source of “big” data on large numbers of patients. We hypothesised that in the period preceding a definitive diagnosis, there exists a series of ordered healthcare events captured within EHR data that characterise the onset and progression of neoplasms that can be exploited to predict future neoplasms occurrence. Using data from the EHR of the Ministry of National Guard Health Affairs (MNG-HA), a large healthcare provider in Saudi Arabia, we aimed to discover health event patterns present in EHR data that predict the development of neoplasms in the year prior to diagnosis. After data cleaning, pre-processing, and applying the inclusion and exclusion criteria, 5,466 patients were available for model construction: 1,715 cases and 3,751 controls. Two predictive models were developed (using Decision tree (DT), and Random Forests (RF)). Age, gender, ethnicity, and ICD-10-chapter (broad disease classification) codes as predictor variables and the presence or absence of neoplasms as the output variable. The common factors associated with a diagnosis of neoplasms within one or more years after their occurrence across all the models were: (1) age at neoplasms/event diagnosis; (2) gender; and patient medical history of (3) diseases of the blood and blood-forming organs and certain disorders involving immune mechanisms, and (4) diseases of the genitourinary system. Model performance assessment showed that RF has higher Area Under the Curve (AUC)=0.76 whereas the DT was less complex. This study is a demonstration that EHR data can be used to predict future neoplasm occurrence.
For medical informaticians, it became more and more crucial to assess the benefits and disadvantages of AI-based solutions as promising alternatives for many traditional tools. Besides quantitative criteria such as accuracy and processing time, healthcare providers are often interested in qualitative explanations of the solutions. Explainable AI provides methods and tools, which are interpretable enough that it affords different stakeholders a qualitative understanding of its solutions. Its main purpose is to provide insights into the black-box mechanism of machine learning programs. Our goal here is to advance the problem of qualitatively assessing AI from the perspective of medical informaticians by providing insights into the central notions, namely: explainability, interpretability, understanding, trust, and confidence.
Considering the growing interest towards next generation sequencing (NGS) and data analysis, and the substantial challenges associated to fully exploiting these technologies and data without the proper experience, an expert knowledge-based user-friendly analytical tool was developed to allow non-bioinformatics experts to process NGS genomic data, automatically prioritise genomic variants and make their own annotations. This tool was developed using a user-centred methodology, where an iterative process was followed until a useful product was developed. This tool allows the users to set-up the pre-processing pipeline, filter the obtained data, annotate it using external and local databases (DBs) and help on deciding which variants are more relevant for each study, taking advantage of its customised expert-based scoring system. The end users involved in the project concluded that CRIBOMICS was easy to learn, use and interact with, reducing the analysis time and possible errors of variant prioritisation for genetic diagnosis.
For guiding decisions on medical diagnoses and diagnoses, it is crucial to receive valid laboratory test results. However, such results can be implausible for the physician, even if the measurements are within the range of known reference values. There are technical sources of implausible results that are related to the laboratory environment, which are frequently not detected through usual measures for ensuring technical validity. Here, we describe the development of a quality assurance tool that tackles this problem and replaces the current manual statistical analyses at the Center for Laboratory Medicine in St Gallen (ZLM). Further analysis of the factors responsible for shifts in laboratory test results requires to collect and analyze data related to reagents as well as calibration or reference probes. Due to a lack of standard operating procedures in many laboratories with respect to these processes, this remains one of the big challenges.
Diabetic nephropathy (DN) is one of the long-term complications of patients with type 2 diabetes. The leading causes of DN are high blood glucose and hyper systolic blood pressure.
A retrospective cohort study was performed to explore the effects of high blood sugar and hyper systolic blood pressure on DN among 660 non-insulin-dependent diabetes mellitus. Data was collected from the HosXP program and medical records from 2016 to 2020. The Forest plot was used to examine the effect of hypertriglyceridemia and hyper systolic blood pressure with DN.
The results confirmed that the factors associated with DN were male, age ≥ 60years, diabetic duration ≥ 10 years, systolic BP ≥ 130 mmHg, and HbA1c ≥ 6%.
The health promotion program should be comprised of the control of blood glucose and systolic blood pressure procedure especially male patients with age ≥ 60 years and diabetic duration ≥ 10 years.
Public perception about vaccines is imperative for successful vaccination programs. This study aims to measure the shift of sentiment towards vaccines after the COVID-19 outbreak in the Arab-speaking population. The study used vaccine-related Arabic Tweets and analyzed the sentiment of users in two different time frames, before 2020 (T1) and after 2020 (T2). The analysis showed that in T1, 48.05% of tweets were positive, and 16.47% of tweets were negative. In T2, 43.03% of tweets were positive, and 20.56% of tweets were negative. Among the Twitter users, the sentiment of 15.92% users shifted towards positive, and the sentiment of 17.90% users shifted towards negative. Public sentiment that have shifted towards positive may be due to the hope of vaccine efficacy, whereas public sentiment that have shifted towards negative may be due to the concerns related to vaccine side effects and misinformation. This study can support policymakers in the Arab world to combat the COVID-19 pandemic by utilizing tools to understand public opinion and sentiment.
Polypharmacy in elderly is a public health problem with both clinical (increase of adverse drug events) and economic issues. One solution is medication review, a structured assessment of patients’ drug orders by the pharmacist for optimizing the therapy. However, this task is tedious, cognitively complex and error-prone, and only a few clinical decision support systems have been proposed for supporting it. Existing systems are either rule-based systems implementing guidelines, or documentary systems presenting drug knowledge. In this paper, we present the ABiMed research project, and, through literature reviews and brainstorming, we identified five candidate innovations for a decision support system for medication review: patient data transfer from GP to pharmacists, use of semantic technologies, association of rule-based and documentary approaches, use of machine learning, and a two-way discussion between pharmacist and GP after the medication review.
The goal of this study was to build a machine learning model for early prostate cancer prediction based on healthcare utilization patterns. We examined the frequency and pattern changes of healthcare utilization in 2916 prostate cancer patients 3 years prior to their prostate cancer diagnoses and explored several supervised machine learning techniques to predict possible prostate cancer diagnosis. Analysis of patients’ medical activities between 1 year and 2 years prior to their prostate cancer diagnoses using XGBoost model provided the best prediction accuracy with high F1 score (0.9) and AUC score (0.73). These pilot results indicated that application of machine learning to healthcare utilization patterns may result in early identification of prostate cancer diagnosis.
In this study, we tested the quality of the information extraction algorithm proposed by our group to detect pulmonary embolism (PE) in medical cases through sentence labeling. Having shown a comparable result (F1 = 0.921) to the best machine learning method (random forest, F1 = 0.937), our approach proved not to miss the information of interest. Scoping the number of texts under review down to distinct sentences and introducing labeling rules contributes to the efficiency and quality of information extraction by experts and makes the challenging tasks of labeling large textual datasets solvable.
Big data reanalysis has the potential to generate novel comparative analyses which aim to generate novel hypotheses and knowledge. However, this approach is underutilized in the realm of cancer research, particularly for cancer stem cells (CSCs). CSCs are a rare subset of tumor cells, which dedifferentiate from healthy adult cells, and have the potential for self-renewal and treatment resistance. This analysis utilizes two publically available single-cell RNA-seq datasets of liver cancer and adult liver cell types to demonstrate how reanalysis of big data can lead to valuable new discoveries. We identify 519 differentially expressed genes between liver CSCs and healthy liver cell types. Here we report the potential novel liver CSC dedifferentiation factor, Msh Homeobox 2, which was significantly upregulated in liver CSCs by 1.36 fold (p-value < 1E-10). These findings have the potential to further advance our knowledge of genes governing the formation of CSCs.
Acute Lymphoblastic Leukemia (ALL) is a life-threatening type of cancer wherein mortality rate is unquestionably high. Early detection of ALL can reduce both the rate of fatality as well as improve the diagnosis plan for patients. In this study, we developed the ALL Detector (ALLD), which is a deep learning-based network to distinguish ALL patients from healthy individuals based on blast cell microscopic images. We evaluated multiple DL-based models and the ResNet-based model performed the best with 98% accuracy in the classification task. We also compared the performance of ALLD against state-of-the-art tools utilized for the same purpose, and ALLD outperformed them all. We believe that ALLD will support pathologists to explicitly diagnose ALL in the early stages and reduce the burden on clinical practice overall.
We interviewed six clinicians to learn about their lived experience using electronic health records (EHR, Allscripts users) using a semi-structured interview guide in an academic medical center in New York City from October to November 2016. Each participant interview lasted approximately one to two hours. We applied a clustering algorithm to the interview transcript to detect topics, applying natural language processing (NLP). We visualized eight themes using network diagrams (Louvain modularity 0.70). Novel findings include the need for a concise and organized display and data entry page, the user controlling functions for orders, medications, radiology reports, and missing signals of indentation or filtering functions in the order page and lab results. Application of topic modeling to qualitative interview data provides far-reaching research insights into the clinicians’ lived experience of EHR and future optimal EHR design to address human-computer interaction issues in an acute care setting.
Insights on end-of-life care decisions, such as do not attempt resuscitation (DNAR), vary between institutions and individual health care professionals. At the era of electronic patient records (EPR), the information of DNAR order may still be recorded in multiple locations making it difficult to find and interpret. A link to a structured web-based questionnaire was sent to all physicians and nurses working in Tampere University Hospital special responsibility area covering a catchment area of 900 000 Finns. Perceptions on DNAR order and documentation was surveyed. In total 934 subjects responded, of which 727 (77%) were nurses and 219 (23%) physicians covering all specialties. We found substantial variation in DNAR order interpretation and documentation among all health care professionals possibly causing information breakdown and compromised end-of-life care.
Electronic Patient Records (EPRs) are valuable data resources for clinical and operational research. The heterogeneity of medical software coupled with the changing data formats and long lifespan of the patient datasets stored in EPRs results in data inconsistencies that hinder operational activities and increase personnel efforts for data lookup and cleaning. This study presents an approach for automated data quality reporting that was developed and tested within a real-world hospital setting at Royal Surrey County Hospital NHS Foundation Trust in 2020. 81 data quality tests configurable via spreadsheets were defined and executed to yield standardised human-readable reports in comma-separated value format. The data evaluation and reporting routines provided manyfold improvement over existing data quality reporting mechanisms.
Screening for cancer and improved treatments have not only improved treatment outcomes and patient survival but have also led to an increase in the number of second primary cancers (SPCs). Hepatocellular carcinoma has been a common occurrence in Taiwan over the past decade. The mortality rate is second only to malignant tumors of lung cancer, and it also represents the fourth highest cancer medical expenditure. This study aimed to use machine learning to identify the risk factors for Hepatocellular carcinoma survivors. Of 378,445 datasets, including 15,251 from patients with SPCs, were collected; 18 predictive variables were considered risk factors for SPCs based on the physician panel discussion. The machine learning techniques employed included support vector machine, C5 decision tree, and random forest. SMOTE (Synthetic Minority Oversampling Technique) sampling method was used to resolve the imbalance problem. The results showed that the top 5 risk factors for SPCs were tumor size, clinical stage, surgery, total bilirubin, and BCLC Stage. The support vector machine method had the highest predicted accuracy (0.7673). The risk factors extracted from the classification models and association rules will be used to provide valuable information for HCC therapy.