Ebook: Healthgrid Applications and Core Technologies
This book presents the proceedings of HealthGrid 2010, the latest in the annual open forum for the integration of grid technologies, e science and e health methods and their application in biomedicine and healthcare. Previous conferences have highlighted the need to involve all actors, such as physicians, scientists and technologists, and have served to demonstrate the usefulness of grids to potential application domains, at least at the prototype level. More recently, cloud computing seems set to make an impact as a paradigm more readily acceptable in the practice of healthcare informatics, whilst grids may remain the infrastructure of choice for researchers. Included in this volume are the 19 papers selected after review from 42 original submissions for full presentation at the 2010 conference. Additional papers, presented as posters at the conference, are reproduced here in shorter form. The book has four sections: section one contains four papers under the broad heading of ‘Socio Economic Aspects and Accessibility’, section two: ‘Future of Grids, Core Technologies & Data Integration’, consists of nine papers and section three comprises a further six papers covering ‘Applications’. Section four includes the ‘Poster Extended Abstracts’. Of interest to grid middleware and healthgrid application developers, ethicists, security experts and policy makers as well as all users of biomedical and health informatics, this book provides an overview of current trends and developments in this increasingly important field of healthcare.
Conferences and Themes
HealthGrid 2010 (http://paris2010.healthgrid.org) is the eighth meeting of this open forum for the integration of grid technologies, e-science and e-health methods and their applications in the biomedical and healthcare domains. The principal objective of the HealthGrid conference and of the HealthGrid Association is the exchange and debate of ideas, technologies, solutions and requirements that interest the grid and the life-science communities and are likely to promote the integration of grids into biomedical research and health in the broadest sense. Participation is encouraged for grid middle-ware and healthgrid application developers, biomedical and health informatics users, ethicists and security experts, and policy makers to participate in a set of multidisciplinary sessions with a common focus on bringing healthgrids closer to real application in the health domain.
HealthGrid conferences have been organized on an annual basis. The first conference, held in 2003 in Lyon (http://lyon2003.healthgrid.org), reflected the need to involve all actors – physicians, scientists and technologists – who might play a role in the application of grid technology to health, whether healthcare or biomedical research. The second conference, held in Clermont-Ferrand in January 2004 (http://clermont2004.healthgrid.org) reported on the earliest efforts in research, mainly work in progress, from a large number of projects. The third conference in Oxford (http://oxford2005.healthgrid.org) had a major focus on first results and exploration of deployment strategies in healthcare. The fourth conference in Valencia (http://valencia2006.healthgrid.org) aimed at consolidating the collaboration among biologists, healthcare professionals and grid technology experts. The fifth conference in Geneva (http://geneva2007.healthgrid.org) focused on the five domains defined by the European Commission as application areas for ‘vertical integration’ through grids in the biomedical field: molecules, cells, organs, individuals, and populations. For each of these five domains, an invited speaker gave a state of the art address followed by concrete projects. This was a loud signal to the community that the usefulness of grids to potential application domains could be demonstrated at least at the prototype level. This theme was also evident at the sixth conference in Chicago (http://chicago2008.healthgrid.org), which proclaimed its focus as ‘e-Science Meets Biomedical Informatics’. The sixth conference was also a landmark in the history of the organisation HealthGrid – and its newly established affiliate HealthGrid.US – as the first conference to be organised outside Europe. As we put it at the time, this was a celebration of similarities and differences, a moment to validate models and principles beyond one's familiar shores. The seventh conference returned to Europe, taking place for the first time in Germany in Berlin (http://berlin2009.healthgrid.org). While most themes touched on in earlier conferences continued to be present, certain other themes came to the fore perhaps more clearly than ever before: accessibility, the fraught challenge of usability, and the question of a business case for healthgrids. Ethical, legal, social and economic issues are sometimes encapsulated in the acronym ELSE, with the wry joke that we would like “someone else” to take care of them; perhaps more clearly than before, the Berlin conference embraced and debated these issues directly. Along with these, a hint of ‘cloud’ also hovered over the discussion of core technologies — and a less metaphorical cloud burst over the open-air conference dinner to refresh participants.
While the desire to promote healthgrid applications to real healthcare settings remains central to the community's ambitions, it is also true that the majority of adopters work in academic environments where research is their principal preoccupation. Judging from the full range of papers submitted this year, cloud computing appears set to make an impact, just when healthcare informatics appears readier to adopt that paradigm, perhaps in preference to grids. So it may turn out that grids will remain the infrastructure of choice for research and clouds for (the business of) healthcare.
Conference at Orsay, Paris, 2010
The call for papers issued in January resulted in 42 submissions; after a review process in which all papers were read by two independent reviewers as well as at least one, sometimes two members of the Steering Committee, 19 papers were accepted for full presentation. In addition, several papers described work that is still at an early stage and not yet ready for presentation as a full paper, or presented work of considerable scientific merit but which had not yet been fully implemented in a grid. These were considered worthy of inclusion in the conference as posters and, in a shorter form, in the proceedings.
Section One — Socio-Economic Aspects and Accessibility
Xin Zhou et al. conducted an extensive survey of opinion about grids in the public sector. Among the significant findings were discrepancies between expectations of users and administrators, hinting at an inherent tension in the concept of grids as a means of full exploitation of resources.
Hanene Rahmouni's report of semi-automated compliance checking, extending her earlier work on the SHARE project, leads to an optimistic picture suggesting that technology and the dread ‘ELSE’ issues are not entirely at odds.
Yassene Mohammed and colleagues consider the means of technology and knowledge transfer in the grid world. They discuss the players, problems and media at work in the process of technology adoption among life scientists. Their analysis examines in particular the medium of exchange (including, e.g., newly trained user experts) and explores various explanatory paradigms of technology transfer.
Femida Gawdry-Sridhar and Silvia Olabarriaga with co-authors compare developments in Canada and in the Netherlands in the light of the vision of the HealthGrid ‘white paper’ broad road maps of the SHARE project. They find grounds for optimism but also identify similar obstacles and resistances on both sides.
Section Two — Future of Grids, Core Technologies & Data Integration
Adam Kraut and colleagues compared clouds vs clusters in an ‘embarrassingly parallel’ biomedical application. This is justified on the grounds of compute-intensity of the application, but judged on usability and performance. This should certainly be a good challenge for the cloud paradigm; the results are somewhat unfavourable.
TRENCADIS, which will be familiar to readers of this series, was originally built with a Service-Oriented Architecture. It has now been ported by Ignacio Blanquer et al. to gLite using features of DICOM Structured Reports and ontologies to integrate data from diverse imaging and associated data.
Reporting from ACGT, Garcia, Karlsson and Trelles describe a catalog of service metadata to support discovery and integration of services. Their approach nicely highlights trade-offs between expressiveness and ease of discovery and composition.
Ashiq Anjum et al. explore the proposed reuse of a highly sophisticated database designed to track the construction of a CERN experiment as a provenance engine in the neuGRID project.
Benbelgacem, Niimaki and Abdennadher extend the work previously reported in 2008 on healthgrid adoption and adaptation of XtremWeb-CH and lead through these developments to the release of a new infrastructure, XWCH2.
A large collaboration from the NeuroLOG project report on its data management layer and their successful federation of neuroimaging data. This has allowed them to achieve semantically tight integration of data from diverse, loosely coupled sources.
The Décrypthon project has established a grid infrastructure for biomedical applications. Although the authors' focus was on the technology (middleware, query language) they illustrate it with a valuable application to neuromuscular disorders.
Paul de Vlieger et al. report on an application that was foreseen even in the HealthGrid ‘white paper’ as one of the founding challenges for healthgrids. The idea is to integrate diverse, geographically distributed databases so as to be able to derive meaningful epidemiological information from the base data, in this case from various cancer screening programmes.
Finally in this section, David Ritchie et al report on the use of special processors for certain biomedical tasks. In particular, the already efficient Hex protein docking algorithm turns out to run significantly faster on graphics processor units. This has certain interesting implications for technology choices in specialist grids for biomedical research.
Section Three — Applications
A team from Charité – Universitätsmedizin Berlin report on the increasing need for imaging in liver disease. Segmentation of MRI images is sufficiently complex to justify use of a grid. The results are compared with an expert's segmentation. Lessons are also drawn for further innovation in healthgrids.
Hausmann, Knoch et al. describe GLOBE 3D Genome Platform an interactive collaboration grid platform for the analysis and visualization of results from their novel COMBO-FISH method of labelling genomic sequences.
Jason Haga and team from California and Japan began their work with the observation of inconsistencies in virtual screening using DOCK on a heterogeneous grid. They experimented with virtualization to overcome heterogeneity and show that variability in results is much reduced. This work again raises the issue of clouds.
Arun Datta and colleagues describe a grid-based epidemiological application integrating information for childhood obesity surveillance. This is an increasingly pressing problem in better off nations. Besides compliance with ethical and legal constraints, they demonstrate improved accuracy of vital data and a reduction in the workload of service providers.
A team from CNRS INSERM CREATIS, Lyon, report on the highly technical problem of parameter optimization in so-called ‘mean-shift’ kernel methods for multi-dimensional data analysis. They demonstrate the feasibility of using EGEE for this project, although they also indicate some gaps in usability.
Trung Tung Doan and colleagues from France and Vietnam illustrate an approach to combine phylogenetic data on a virus with the surveillance network concept already illustrated in other applications. This leads to a highly effective method of H1N1 influenza monitoring.
Section Four — Poster Extended Abstracts
Among many applications, Graham Billiau and colleagues discuss grid-based optimization of radiotherapy patient scheduling, a problem that has wider implications. Daniela Skrowny and team describe a novel information platform for new grid users in biomedicine. Mark Olive et al consider the use of care pathway records for research with assisted reproduction as a case study. Anthony Stell and colleagues discuss the compliance pressure in a project to integrate clinical and genomic data relating to patients with adrenal tumours. Eberhard Schmitt et al do some very sophisticated geometry with cell nucleus architecture to derive biologically meaningful information.
More traditional applications of healthgrids (is it not remarkable that we can already speak of ‘traditional’ applications) are closer to biomedicine than healthcare, though the distance between the two is perceptibly less than it was. Raul Isea and colleagues discuss bayesian models to characterize antigenic serotypes of the Dengue virus. Marta Loureiro and team present a grid-based selection process for models of nucleotide substitution. Aristotelis Chatziioannou and team describe GRISSOM, a microarray analysis application.
In the knowledge management dimension, we have two contributions. The first from Carl Taswell is an interesting ontology-based attempt to enhance data integration and interoperability for cross-domain retrieval and sharing of biomedical knowledge. In the second, Alfredo Tirado-Ramos and colleagues outline HIV-K which they describe as a ‘semantically integrative’ knowledge base for AIDS-related cancer data.
Tobias Knoch and his colleagues display a characteristic enthusiasm for their proposed resource-efficient grid architecture. I suspect it would be claimed by some to be ‘a cloud’, pending a precise definition of that term.
Tony Solomonides, 17th May 2010
Grids have been developed to enable computational or data intensive scientific applications and to share resources. Existing infrastructures can benefit other possible user groups. This qualitative survey focuses on the public sector with universities, governments and hospitals. Interviews were performed to collect information on the perception of Grids by decision makers, system administrators and high level users in the sector. Application scenarios and main barriers for Grid adoption are discussed based on the users' and institutional interests.
To be processed within a healthgrid environment, medical data goes through a complete lifecycle and several stages until it is finally used for the primary reason it has been collected for. This stage is not always the final occurrence of when the data would have been manipulated. The data could rather continue to be needed for secondary purposes of legitimate or non legitimate nature. Although other privacy issues are related to the processing of patient data while it is residing on a healthgrid environment, the control of data disclosure is our primary interest. When sharing medical data between different Healthcare and biomedical research organizations in Europe, it is important that the different parties involved in the sharing handle the data in the same way indicated by the legislation of the member state where the data was originally collected as the requirements might differ from one state to another. Privacy requirements, such as patient consent, may be subject to conflicting conditions between different national frameworks as well as between different legal and ethical frameworks within a single member state. These circumstances have made the compliance management process in European healthgrid very challenging. In this paper we are presenting an approach to tackle these issues by relying on several technologies contained in the semantic web stack. Our work suggests a direct mapping from high level legislation on privacy and data protection to operational level privacy aware controls. Additionally we suggest an architecture for the enforcement of these controls on access control models adopted by healthgrids security infrastructures.
Natural scientists such as physicists pioneered the sharing of computing resources, which resulted in the Grid. The inter domain transfer process of this technology has been an intuitive process. Some difficulties facing the life science community can be understood using the Bozeman's “Effectiveness Model of Technology Transfer”. Bozeman's and classical technology transfer approaches deal with technologies that have achieved certain stability. Grid and Cloud solutions are technologies that are still in flux. We illustrate how Grid computing creates new difficulties for the technology transfer process that are not considered in Bozeman's model. We show why the success of health Grids should be measured by the qualified scientific human capital and opportunities created, and not primarily by the market impact. With two examples we show how the Grid technology transfer theory corresponds to the reality. We conclude with recommendations that can help improve the adoption of Grid solutions into the biomedical community. These results give a more concise explanation of the difficulties most life science IT projects are facing in the late funding periods, and show some leveraging steps which can help to overcome the “vale of tears”.
We consider the issues of healthgrid development, deployment and adoption in health care and research environments. While healthgrid technology could be deployed to support advanced medical research, we are not seeing its wide adoption. Understanding why this technology is not being exploited is one purpose of this paper. We do so in light of the seminal Healthgrid White Paper and the SHARE roadmap. We also address barriers to adoption and successes by presenting experiences in North America and Europe. By critically appraising where we are, we hope that we can hit the ground running in the near future.
Cloud computing has recently become very popular, and several bioinformatics applications exist already in that domain. The aim of this article is to analyse a current cloud system with respect to usability, benchmark its performance and compare its user friendliness with a conventional cluster job submission system. Given the current hype on the theme, user expectations are rather high, but current results show that neither the price/performance ratio nor the usage model is very satisfactory for large-scale embarrassingly parallel applications. However, for small to medium scale applications that require CPU time at certain peak times the cloud is a suitable alternative.
The problem of sharing medical information among different centres has been tackled by many projects. Several of them target the specific problem of sharing DICOM images and structured reports (DICOM-SR), such as the TRENCADIS project. In this paper we propose sharing and organizing DICOM data and DICOM-SR metadata benefiting from the existent deployed Grid infrastructures compliant with gLite such as EGEE or the Spanish NGI. These infrastructures contribute with a large amount of storage resources for creating knowledge databases and also provide metadata storage resources (such as AMGA) to semantically organize reports in a tree-structure. First, in this paper, we present the extension of TRENCADIS architecture to use gLite components (LFC, AMGA, SE) on the shake of increasing interoperability. Using the metadata from DICOM-SR, and maintaining its tree structure, enables federating different but compatible diagnostic structures and simplifies the definition of complex queries. This article describes how to do this in AMGA and it shows an approach to efficiently code radiology reports to enable the multi-centre federation of data resources.
A great variety of services have been developed to address problems in the field of biomedicine. The EU project Advancing Clinico-Genomics Trials on Cancer (ACGT – http://www.eu-acgt.org) provides a Grid-based platform for improved medical knowledge discovery and integration of biomedical data in clinical trials on cancer. Metadata describing biomedical services needs to be shared to enable discovery and service composition (as workflows). This paper reports a catalogue for knowledge-based discovery of service metadata and a software module to wrap existing command line programs as a secure Grid service able to handle sensitive information.
We outline the approach being developed in the neuGRID project to use provenance management techniques for the purposes of capturing and preserving the provenance data that emerges in the specification and execution of workflows in biomedical analyses. In the neuGRID project a provenance service has been designed and implemented that is intended to capture, store, retrieve and reconstruct the workflow information needed to facilitate users in conducting user analyses. We describe the architecture of the neuGRID provenance service and discuss how the CRISTAL system from CERN is being adapted to address the requirements of the project and then consider how a generalised approach for provenance management could emerge for more generic application to the (Health)Grid community.
Many medical applications utilise distributed/parallel computing in order to cope with demands of large data or computing power requirements. In this paper, we present a new version of the XtremWeb-CH (XWCH) platform, and demonstrate two medical applications that run on XWCH. The platform is versatile in a way that it supports direct communication between tasks. When tasks cannot communicate directly, warehouses are used as intermediary nodes between “producer” and “consumer” tasks. New features have been developed to provide improved support for writing powerfull distributed applications using an easy API.
Grid technologies are appealing to deal with the challenges raised by computational neurosciences and support multi-centric brain studies. However, core grids middleware hardly cope with the complex neuroimaging data representation and multi-layer data federation needs. Moreover, legacy neuroscience environments need to be preserved and cannot be simply superseded by grid services. This paper describes the NeuroLOG platform design and implementation, shedding light on its Data Management Layer. It addresses the integration of brain image files, associated relational metadata and neuroscience semantic data in a heterogeneous distributed environment, integrating legacy data managers through a mediation layer.
Thanks to the availability of computational grids and their middleware, a seamless access to computation and storage resources is provided to application developers and scientists. The Décrypthon project is one example of such a high performance platform. In this paper, we present the architecture of the platform, the middleware developed to facilitate access to several servers deployed in France, and the data center for integrating large biological datasets over multiple sites, supported by a new query language and integration of various tools. The SM2PH project represents an example of a biological application that exploits the capacities of the Décrypthon grid. The goal of SM2PH is a better understanding of mutations involved in human monogenic diseases, their impact on the 3D structure of the protein and the subsequent consequences for the pathological phenotypes.
Grid technologies have proven their capabilities to settle challenging problems of medical data access. The grid ability to access distributed databases in a secure and reliable way while preserving data ownership opened new perspectives in medical data sharing and disease surveillance. This paper focuses on the implementation challenges of grid-powered sentinel networks within the e-sentinelle project. This initiative aims to create a lightweight grid dedicated to cancer data exchange and enable automatic disease surveillance according to definition of epidemiological alarms. Particularly, issues related to security, patient identification, databases integration, data representation and medical record linkage are discussed.
Protein docking is the computationally intensive task of calculating the three-dimensional structure of a protein complex starting from the individual structures of the constituent proteins. In order to make the calculation tractable, most docking algorithms begin by assuming that the structures to be docked are rigid. This article describes some recent developments we have made to adapt our FFT-based “Hex” rigid-body docking algorithm to exploit the computational power of modern graphics processors (GPUs). The Hex algorithm is very efficient on conventional central processor units (CPUs), yet significant further speed-ups have been obtained by using GPUs. Thus, FFT-based docking calculations which formerly took many hours to complete using CPUs may now be carried out in a matter of seconds using GPUs. The Hex docking program and access to a server version of Hex on a GPU-based compute cluster are both available for public use.
Recent developments in MRI contrast agents give new perspectives in radiological diagnosis and therapy planning, but require specific image analysis methods. By employment of an academic research grid, we are currently validating and optimizing a recently developed fully automatic method for liver segmentation in Gd-EOB enhanced MRI. The grid enables extensive parameter scans and evaluation against expert's reference segmentation. The implementation layout and so far reached results are presented. Furthermore, experiences made in the production phase and consequences resulting for the exploitation of publicly funded research grids for Healthgrid applications are given.
The genome architecture in cell nuclei plays an important role in modern microscopy for the monitoring of medical diagnosis and therapy since changes of function and dynamics of genes are interlinked with changing geometrical parameters. The planning of corresponding diagnostic experiments and their imaging is a complex and often interactive IT intensive challenge and thus makes high-performance grids a necessity. To detect genetic changes we recently developed a new form of fluorescence in situ hybridization (FISH) – COMBinatorial Oligonucleotide FISH (COMBO-FISH) – which labels small nucleotide sequences clustering at a desired genomic location. To achieve a unique hybridization spot other side clusters have to be excluded. Therefore, we have designed an interactive pipeline using the grid-based GLOBE 3D Genome Viewer and Platform to design and display different labelling variants of candidate probe sets. Thus, we have created a grid-based virtual “paper” tool for easy interactive calculation, analysis, management, and representation for COMBO-FISH probe design with many an advantage: Since all the calculations and analysis run in a grid, one can instantly and with great visual ease locate duplications of gene subsequences to guide the elimination of side clustering sequences during the probe design process, as well as get at least an impression of the 3D architectural embedding of the respective chromosome region, which is of major importance to estimate the hybridization probe dynamics. Beyond, even several people at different locations could work on the same process in a team wise manner. Consequently, we present how a complex interactive process can profit from grid infrastructure technology using our unique GLOBE 3D Genome Platform gateway towards a real interactive curative diagnosis planning and therapy monitoring.
Large-scale in-silico screening is a necessary part of drug discovery and Grid computing is one answer to this demand. A disadvantage of using Grid computing is the heterogeneous computational environments characteristic of a Grid. In our study, we have found that for the molecular docking simulation program DOCK, different clusters within a Grid organization can yield inconsistent results. Because DOCK in-silico virtual screening (VS) is currently used to help select chemical compounds to test with in-vitro experiments, such differences have little effect on the validity of using virtual screening before subsequent steps in the drug discovery process. However, it is difficult to predict whether the accumulation of these discrepancies over sequentially repeated VS experiments will significantly alter the results if VS is used as the primary means for identifying potential drugs. Moreover, such discrepancies may be unacceptable for other applications requiring more stringent thresholds. This highlights the need for establishing a more complete solution to provide the best scientific accuracy when executing an application across Grids. One possible solution to platform heterogeneity in DOCK performance explored in our study involved the use of virtual machines as a layer of abstraction. This study investigated the feasibility and practicality of using virtual machine and recent cloud computing technologies in a biological research application. We examined the differences and variations of DOCK VS variables, across a Grid environment composed of different clusters, with and without virtualization. The uniform computer environment provided by virtual machines eliminated inconsistent DOCK VS results caused by heterogeneous clusters, however, the execution time for the DOCK VS increased. In our particular experiments, overhead costs were found to be an average of 41% and 2% in execution time for two different clusters, while the actual magnitudes of the execution time costs were minimal. Despite the increase in overhead, virtual clusters are an ideal solution for Grid heterogeneity. With greater development of virtual cluster technology in Grid environments, the problem of platform heterogeneity may be eliminated through virtualization, allowing greater usage of VS, and will benefit all Grid applications in general.
CHOIS, the Child Health and Obesity Informatics System
This has been developed with the financial support and sponsorships of Illinois Department of Human Services (DHS), The University of Illinois at Urbana-Champaign (UIUC), the National Center for Supercomputing Applications (NCSA), and the National University Community Research Institute (NUCRI). Part of this development was presented at PRAGMA 18 [5].
This paper studies the optimization of Mean-Shift (MS) image filtering scale parameters. A parameter sweep experiment representing 164 days of CPU is performed on the EGEE grid. The mathematical foundations of Mean-Shift and the grid environment used for the deployment are described in details. The experiments and results are then discussed highlighting the efficiency of gradient ascent algorithm for MS parameters optimization and a number of grid observations related to data transfers, reliability, task scheduling, CPU time and usability.
The 2009 H1N1 outbreak has demonstrated that continuing vigilance, planning, and strong public health research capability are essential defenses against emerging health threats. Molecular epidemiology of influenza virus strains provides scientists with clues about the temporal and geographic evolution of the virus. In the present paper, researchers from France and Vietnam are proposing a global surveillance network based on grid technology: the goal is to federate influenza data servers and deploy automatically molecular epidemiology studies. A first prototype based on AMGA and the WISDOM Production Environment extracts daily from NCBI influenza H1N1 sequence data which are processed through a phylogenetic analysis pipeline deployed on EGEE and AuverGrid e-infrastructures. The analysis results are displayed on a web portal (http://g-info.healthgrid.org) for epidemiologists to monitor H1N1 pandemics.
In the health system inefficiency leads to poor use of scarce expensive resources. Lengthy patient treatment waiting time can result from inefficiency in scheduling. The use of state-of-the art multi-agent and distributed computing technologies can provide a solution to address this problem. However, distributed optimisation in such a multi-agent setting poses an important challenge that requires protocols to enable agents to optimise shared objectives without necessarily revealing all of their private constraints. In this study we show that if the problem is expressed as a Dynamic Distributed Constraint Optimisation Problem a powerful algorithm such as SBDO can be deployed to solve it. As SBDO can be deployed in a grid all of the advantages of grid computing are also gained.