Ebook: Challenges and Opportunities of HealthGrids
The contributions of this publication follow mainly five main topics: Medical Imaging on the Grid; Ethical, Legal and Privacy Issues on HealthGrids; Bioinformatics on the Grid; Knowledge Discovery on HealthGrids; and Medical Assessment and HealthGrid Applications. The maturity of the discipline of HealthGrids is clearly reflected on these subjects. There are more contributions related to two main application areas (Medical Imaging and Bioinformatics), confirming the analysis of the HealthGrid White Paper published last year, which outlined them as the two more promising areas for HealthGrids. Along with these two areas, the assessment on the results of HealthGrid applications, also focused by several contributions, denotes also the maturity of HealthGrids. Finally the other two areas (Knowledge Discovery and Ethical, Legal and Privacy Issues) focus on basic technologies which are very relevant for HealthGrids.
HealthGrid 2006 (http://valencia2006.healthgrid.org) is the fourth edition of this open forum for the integration of Grid Technologies and its Applications in the Biomedical, Medical and Biological domains to pave the path to an International Research Area in HealthGrid. The main objective of HealthGrid conference and the HealthGrid Association is the exchange and discussion of ideas, technologies, solutions and requirements that interest the Grid and the Life-Sciences communities to foster the integration of Grids into Health. Participation is encouraged for Grid middleware and Grid applications developers, Biomedical and Health Informatics users and security and policy makers to participate in a set of multidisciplinary sessions with a common concern on the applications to Health.
HealthGrid conferences have been organized in an annual basis. The first conference, held in 2003 in Lyon (http://lyon2003.healthgrid.org), reflected the need to involve all actors – physicians, scientists and technologists – who might play a part in the application of Grid technology to Health, whether health care or bio-medical research. The second conference, held in Clermont-Ferrand in January 2004 (http://clermont2004.healthgrid.org) reported research and work in progress from a large number of projects. The third conference of Oxford (http://oxford2005.healthgrid. org) had a major concern on the results and deployment strategies in Healthcare. Finally, this issue aims at consolidating the collaboration among Biologists, Healthcare professionals and Grid Technology experts.
The conference includes a number of high-profile keynote presentations complemented by a set of high quality refereed papers. The number of contributions to this conference has increased from previous editions, reaching the number of 44 submissions of papers and demos from principal authors coming from 14 countries (according to the number of contributions: France, United Kingdom, Spain, Italy, Germany, Greece, The Netherlands, Belgium, Czech Republic, Cuba, Japan, Romania, Russia and Taiwan). Considering the affiliations of all the authors of the papers, the number of contributing countries is extended to 18 countries including Switzerland, Austria, Turkey and USA. The contributions of this edition follow mainly five main topics: Medical Imaging on the Grid; Ethical, Legal and Privacy Issues on HealthGrids; Bioinformatics on the Grid; Knowledge Discovery on HealthGrids and Medical Assessment and HealthGrid Applications. The maturity of the discipline of HealthGrids is clearly reflected on these subjects. There are more contributions related to two main application areas (Medical Imaging and Bioinformatics), confirming the analysis of the HealthGrid White Paper published last year, which outlined them as the two more promising areas for HealthGrids. Along with these two areas, the assessment on the results of HealthGrid applications, also focused by several contributions, denotes also the maturity of HealthGrids. Finally the other two areas (Knowledge Discovery and Ethical, Legal and Privacy Issues) focus on basic technologies which are very relevant for HealthGrids.
In Medical Imaging, the different contributions covered the problems of medical image processing and virtual distributed storages. In this topic there are contributions focusing on the structuring of medical information through semantic classifications, as in the case of the NeuroBase project presented by Barillot et al. or in the case of the TRENCADIS software architecture presented by Blanquer et al. The problem of encryption and data sharing is a very important topic addressed in contributions such as the Medical Data Manager (Montagnat et al.) and other contributions related with privacy. In the area of medical image processing several papers describe their experiences on providing services for neuroimaging. The work of Olabarriaga et al. cover image processing services for FMRI (Functional Magnetic Resonance Imaging), and Bagnasco et al. describe the application of Grid for the early diagnosis of Alzheimer's disease by assisting the diagnosis on PET / SPECT through Statistical Parametric Mapping and on the highly-computational problem of Fibre Tracking (Bucur et al.). On the area of modelling processes related with medical images, Bellet et al. proposes a web interface for MRI devices simulation, and Blanquer et al. proposes a Grid implementation of processing services for co-registration of medical images for assessing a quantitative diagnosis of liver cancer. In this precise topic of image co-registration, Montagnat et al. propose a mechanism to evaluate the quality of co-registration methods using the Grid, in a methodology called “Bronze Standard”. Finally, the problem of interactive use of Grids for medical image processing is tackled in the work of Germain-Renaud et al.
In the area of Ethical, Legal and Privacy Issues on HealthGrids, on one side, contributions focus on ethical and legal issues, such as the problem of medical consent (Herveg et al.) and the organisation of Virtual Organisations for clinical trials in epidemiology (Sinnott et al.). On the other side, different technical solutions for privacy enhancement are presented. In the work of Torres et al., a solution for sharing an en- crypted and distributed storage of medical images is presented. A similar approach is used by Blanchet et al. to propose a mechanism for encrypting genetic information. Other approaches for sharing and linking for distributed repositories of epidemiological data are presented in Ainsworth et al. and Tashiro et al.
The area of Bioinformatics is a very active one in HealthGrids. The increasing on size and complexity of genomic databases and protein modelling is opening the door to new Grid applications. Results in large-scale in-silico docking for malaria is presented in Jacq et al. and a grid-enabled protein structure prediction system namely Rokky-G is presented in the work of Masuda et al. Other important activity on this topic is the integration of bioinformatics information where the complexity of browsing data is also considered by Schroeder et al. in the frame of the Sealife project. Other approach based in data mediation is presented by Colonna et al. for predisposition Genes discovery. The integration of OGSA-DAI technologies for biochemical distributed data is proposed in Tverdokhlebov et al. The development of genomic processing services and its interfacing to Grid is presented in the porting of the GPS@ portal (Blanchet et al.), and in the work of Segrelles et al., in which an MPIBlast processing Grid service is developed and integrated in a Gene Annotation tool (Blast2GO). The early results of the BIOINFOGRID project are presented in the work of Milanesi et al. More consolidated results on bioprofiling are presented on the work of Sun et al. in the frame of the BIOPATTERN project. Finally, an application of HealthGrid to SARS is described in the work of Hung et al.
In the area of Knowledge Discovery on HealthGrids, contributions focus on the semantic integration of medical information. The work of Boniface et al., in the frame of the ARTEMIS project, focus on Healthcare data, whereas the work of Koutkias et al. focus on semantic integration of bioinformatics data. The semantic integration is the key for knowledge discovery in large databases, in which techniques such as Data Mining are applied. Tsiknakis et al. propose the use of these techniques for cancer study on the ACGT IP, and McClatchey et al. apply those techniques for the integration of paediatric information in the frame of the Health-e-child project.
In the area of Medical Assessment and HealthGrid Applications, covers, on one side, medical results of the application of Grid technologies to Health and other applications related to biomedical simulation and clinical environments. The application of Grids to radiotherapy is also a classic topic due to the maturity of High Energy Physics, revealing new applications of the MonteCarlo simulation to Intensity-Modulated Radiation Therapy (Gómez et al.) and interfacing to well-known environments such as GATE (Thiam et al.). Other applications of P2P and Grid technologies show their potential for emergency management (Harrison et al.), and collaborating environments (Kuba et al.). Finally, contributions also focus on the needs of hospital management systems for Grids (Graschew et al.), the success stories of e-DiaMoND and NeuroGrid projects (Ure et al.) and the exploitation of successful projects on Medical Imaging and Grids, such as the MAMMOGRID project (del Frate et al.).
The NeuroBase project aims at studying the requirements for federating, through the Internet, information sources in neuroimaging. These sources are distributed in different experimental sites, hospitals or research centers in cognitive neurosciences, and contain heterogeneous data and image processing programs. More precisely, this project consists in creating of a shared ontology, suitable for supporting various neuroimaging applications, and a computer architecture for accessing and sharing relevant distributed information. We briefly describe the semantic model and report in more details the architecture we chose, based on a media-tor/wrapper approach. To give a flavor of the future deployment of our architecture, we de-scribe a demonstrator that implements the comparison of distributed image processing tools applied to distributed neuroimaging data.
This paper describes the effort to deploy a Medical Data Management service on top of the EGEE grid infrastructure. The most widely accepted medical image standard, DICOM, was developed for fulfilling clinical practice. It is implemented in most medical image acquisition and analysis devices. The EGEE middleware is using the SRM standard for handling grid files. Our prototype is exposing an SRM compliant interface to the grid middleware, transforming on the fly SRM requests into DICOM transactions. The prototype ensures user identification, strict file access control and data protection through the use of relevant grid services. This Medical Data Manager is easing the access to medical databases needed for many medical data analysis applications deployed today. It offers a high level data management service, compatible with clinical practices, which encourages the migration of medical applications towards grid infrastructures. A limited scale testbed has been deployed as a proof of concept of this new service. The service is expected to be put into production with the next EGEE middleware generation.
Grids are facing the challenge of moving from batch systems to interactive computing. In the 70s, standalone computer systems have met this challenge, and this was the starting point of pervasive computing. Meeting this challenge will allow grids to be the infrastructure for ambient intelligence and ubiquitous computing. This paper shows that EGEE, the largest world grid, does not yet provide the services required for interactive computing, but that it is amenable to this evolution through relatively modest middleware evolution. A case study on medical image analysis exemplifies the particular needs of ultra-short jobs.
In this paper, we present a web portal that enables simulation of MRI images on the grid. Such simulations are done using the SIMRI MRI simulator that is implemented on the grid using MPI and the LCG2 middleware. MRI simulations are mainly used to study MRI sequence, and to validate image processing algorithms. As MRI simulation is computationally very expensive, grid technologies appear to be a real added value for the MRI simulation task. Nevertheless the grid access should be simplified to enable final user running MRI simulations. That is why we develop this specific web portal to propose a user friendly interface for MRI simulation on the grid.
The web portal is designed using a three layers client/server architecture. Its main component is the process layer part that manages the simulation jobs. This part is mainly based on a java thread that screens a data base of simulation jobs. The thread submits the new jobs to the grid and updates the status of the running jobs. When a job is terminated, the thread sends the simulated image to the user. Through a client web interface, the user can submit new simulation jobs, get a detailed status of the running jobs, have the history of all the terminated jobs as well as their status and corresponding simulated image.
Functional Magnetic Resonance Imaging (fMRI) is a popular tool used in neuroscience research to study brain activation due to motor or cognitive stimulation. In fMRI studies, large amounts of data are acquired, processed, compared, annotated, shared by many users and archived for future reference. As such, fMRI studies have characteristics of applications that can benefit from grid computation approaches, in which users associated with virtual organizations can share high performance and large capacity computational resources. In the Virtual Laboratory for e-Science (VL-e) Project, initial steps have been taken to build a grid-enabled infrastructure to facilitate data management and analysis for fMRI. This article presents our current efforts for the construction of this infrastructure. We start with a brief overview of fMRI, and proceed with an analysis of the existing problems from a data management perspective. A description of the proposed infrastructure is presented, and the current status of the implementation is described with a few preliminary conclusions.
Grid technologies have the potential to enable healthcare organizations to efficiently use powerful tools, applications and resources, many of which were so far inaccessible to them. This paper introduces a service-oriented architecture meant to Grid-enable several classes of computationally intensive medical applications for improved performance and cost-effective access to resources. We apply this architecture to fiber tracking [1,2], a computationally intensive medical application suited for parallelization through decomposition, and carry out experiments with various sets of parameters, in realistic environments and with standard network solutions. Furthermore, we deploy and assess our solution in a hospital environment, at the Amsterdam Medical Center, as part of our cooperation in the Dutch VL-e project. Our results show that parallelization and Grid execution may bring significant performance improvements and that the overhead introduced by making use of remote, distributed resources is relatively small.
A quantitative statistical analysis of perfusional medical images may provide powerful support to the early diagnosis for Alzheimer's Disease (AD). A Statistical Parametric Mapping algorithm (SPM), based on the comparison of the candidate with normal cases, has been validated by the neurological research community to quantify ipometabolic patterns in brain PET/SPECT studies. Since suitable “normal patient” PET/SPECT images are rare and usually sparse and scattered across hospitals and research institutions, the Data Grid distributed analysis paradigm (“move code rather than input data”) is well suited for implementing a remote statistical analysis use case, described in the present paper. Different Grid environments (LCG, AliEn) and their services have been used to implement the above-described use case and tackle the challenging problems related to the SPM-based early AD diagnosis.
The analysis of the angiogenesis in hepatic lesions is an important marker of tumour aggressiveness and response to therapy. However, the quantitative analysis of this fact requires a deep knowledge of the hepatic perfusion. The development of pharmacokinetic models constitutes a very valuable tool, but it is computationally intensive. Moreover, abdominal imaging processing increases the computational requirements since the movement of the patient makes images in a time series incomparable, requiring a previous pre-processing. This work presents a Grid environment developed to deal with the computational demand of pharmacokinetic modelling. This article proposes and implements a four-level software architecture that provides a simple interface to the user and deals transparently with the complexity of Grid environment. The four layers implemented are: Grid Layer (the closest to the Grid infrastructure), the Gate-to- Grid (which transforms the user requests to Grid operations), the Web Services layer (which provides a simple, standard and ubiquitous interface to the user) and the Application Layer. An application has been developed on top of this architecture to manage the execution of multi-parametric groups of co-registration actions on a large set of medical images. The execution has been performed on the EGEE Grid infrastructure. The application is platform-independent and can be used from any computer without special requirements.
Medical image registration is pre-processing needed for many medical image analysis procedures. A very large number of registration algorithms are available today, but their performance is often not known and very difficult to assess due to the lack of gold standard. The Bronze Standard algorithm is a very data and compute intensive statistical approach for quantifying registration algorithms accuracy.
In this paper, we describe the Bronze Standard application and we discuss the need for grids to tackle such computations on medical image databases. We demonstrate MOTEUR, a service-based workflow engine optimized for dealing with data intensive applications. MOTEUR eases the enactment of the Bronze Standard and similar applications on the EGEE production grid infrastructure. It is a generic workflow engine, based on current standards and freely available, that can be used to instrument legacy application code at low cost.
Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data bans the processing of medical data owing to their highly sensitive nature. Fortunately the Directive provides that this ban does not apply in seven cases. The paper aims first to explain the reasons for this ban. Then it describes the conditions under which medical data may be processed under European Law. The paper investigates notably the strengths and weaknesses of the data subject's consent as base of legitimacy for the processing of medical data. It also considers the six other alternatives to legitimate the processing of medical data.
E-Health initiatives such as electronic clinical trials and epidemiological studies require access to and usage of a range of both clinical and other data sets. Such data sets are typically only available over many heterogeneous domains where a plethora of often legacy based or in-house/bespoke IT solutions exist. Considerable efforts and investments are being made across the UK to upgrade the IT infrastructures across the National Health Service (NHS) such as the National Program for IT in the NHS (NPFIT) [1]. However, it is the case that currently independent and largely non-interoperable IT solutions exist across hospitals, trusts, disease registries and GP practices – this includes security as well as more general compute and data infrastructures. Grid technology allows issues of distribution and heterogeneity to be overcome, however the clinical trials domain places special demands on security and data which hitherto the Grid community have not satisfactorily addressed. These challenges are often common across many studies and trials hence the development of a re-usable framework for creation and subsequent management of such infrastructures is highly desirable. In this paper we present the challenges in developing such a framework and outline initial scenarios and prototypes developed within the MRC funded Virtual Organisations for Trials and Epidemiological Studies (VOTES) project [2].
Grid technologies have proven to be very successful in tackling challenging problems in which data access and processing is a bottleneck. Notwithstanding the benefits that Grid technologies could have in Health applications, privacy leakages of current DataGrid technologies due to the sharing of data in VOs and the use of remote resources, compromise its widespreading. Privacy control for Grid technology has become a key requirement for the adoption of Grids in the Healthcare sector. Encrypted storage of confidential data effectively reduces the risk of disclosure. A self-enforcing scheme for encrypted data storage can be achieved by combining Grid security systems with distributed key management and classical cryptography techniques. Virtual Organizations, as the main unit of user management in Grid, can provide a way to organize key sharing, access control lists and secure encryption management. This paper provides programming models and discusses the value, costs and behavior of such a system implemented on top of one of the latest Grid middlewares.
This work is partially funded by the Spanish Ministry of Science and Technology in the frame of the project Investigación y Desarrollo de Servicios GRID: Aplicación a Modelos Cliente-Servidor, Colaborativos y de Alta Productividad, with reference TIC2003-01318.
Biological data are most times published and then become public ones. They, then, do not need to be isolated or encrypted. But, in some cases, these data stemed from patients or are analyzed with, for instance, pharmaceutical or agronomics goals. Also in simple ways , these data, before to become public, have to be kept confidential while researchers haven't been able to publish their work or to register them. So they are a lot of cases where the integrity and the confidentiality of biological data have to be protected against unauthorized accesses. But, as these private data are also large datasets, they need high-throughput computing and huge data storage to processed, such as ones produced by complete genome projects. These requirements are enhanced in the context of a Grid such EGEE, where the computing and storage resources are distributed across a large-scale platform. We have developed a secured distributed service to manage biological data on grid: the EncFile encrypted files management system. We have deployed it on the production platform of the EGEE grid project. Thus we provided grid users with a user-friendly component that doesn't require any user privileges. And we have integrated into a bioinformatics grid portal associated to encrypted representative biological resources: world-famous databases and programs.
WISDOM stands for World-wide In Silico Docking On Malaria. First step toward enabling the in silico drug discovery pipeline on a grid infrastructure, this CPU consuming application generating large data flows was deployed successfully on EGEE, the largest grid infrastructure in the world, during the summer 2005. 46 million docking scores were computed in 6 weeks. The proposed demonstration presents the submission of in silico docking jobs at a large scale on the grid. The demonstration will use the new middleware stack gLite developed within the EGEE project.
In recent years, simulation using computer systems has been of increasing importance in the life sciences. We have developed a system called “Rokky-G” that facilitates a protein structure prediction strategy called “Rokky” on Grid systems. Rokky-G provides the framework of protein structure prediction on the Grid. In this paper we discuss the architecture of Rokky-G and implementation issues identified in order to obtain highly reliable results.
The objective of Sealife is the conception and realisation of a semantic Grid browser for the life sciences, which will link the existing Web to the currently emerging eScience infrastructure. The SeaLife Browser will allow users to automatically link a host of Web servers and Web/Grid services to the Web content he/she is visiting. This will be accomplished using eScience's growing number of Web/Grid Services and its XML-based standards and ontologies. The browser will identify terms in the pages being browsed through the background knowledge held in ontologies. Through the use of Semantic Hyperlinks, which link identified ontology terms to servers and services, the SeaLife Browser will offer a new dimension of context-based information integration.
In this paper, we give an overview over the different components of the browser and their interplay. This SeaLife Browser will be demonstrated within three application scenarios in evidence-based medicine, literature & patent mining, and molecular biology, all relating to the study of infectious diseases. The three applications vertically integrate the molecule/cell, the tissue/organ and the patient/population level by covering the analysis of high-throughput screening data for endocytosis (the molecular entry pathway into the cell), the expression of proteins in the spatial context of tissue and organs, and a high-level library on infectious diseases designed for clinicians and their patients.
For more information see http://www.biote.ctu-dresden.de/sealife.
Virtual organizations of researchers need effective tools to work collaboratively with huge sets of heterogeneous data distributed over HealthGrid. This paper describes a mechanism of supporting Digital Libraries in High-Performance Computing environment based on Grid technology. The proposed approach provides abilities to assemble heterogeneous data from distributed sources into integrated virtual collections by using OGSA-DAI. The core of the conception is a Repository of Meta-Descriptions that are sets of metadata which define personal and collaborative virtual collections on base of virtualized information resources. The Repository is kept in a native XML-database Sedna and is maintained by Grid Data Services.
Bioinformatics analysis of data produced by complete genome sequencing projects is one of the major challenges of the current years. Integrating up-to-date databanks and relevant algorithms is a clear requirement of such analysis. Grid computing would be a viable solution to distribute data, algorithms, computing and storage resources for Genomics. Providing bioinformaticians with a good interface to grid infrastructure, such as the one provided by the EGEE European project, is also a challenge to take up. The GPS@ web portal, “Grid Protein Sequence Analysis”, aims to provide such a user-friendly interface for these grid genomic resources on the EGEE grid.
The vast amount in complexity of data generated in Genomic Research implies that new dedicated and powerful computational tools need to be developed to meet their analysis requirements. Blast2GO (B2G) is a bioinformatics tool for Gene Ontology-based DNA or protein sequence annotation and function-based data mining. The application has been developed with the aim of affering an easy-to-use tool for functional genomics research. Typical B2G users are middle size genomics labs carrying out sequencing, ETS and microarray projects, handling datasets up to several thousand sequences. In the current version of B2G. The power and analytical potential of both annotation and function data-mining is somehow restricted to the computational power behind each particular installation. In order to be able to offer the possibility of an enhanced computational capacity within this bioinformatics application, a Grid component is being developed. A prototype has been conceived for the particular problem of speeding up the Blast searches to obtain fast results for large datasets. Many efforts have been done in the literature concerning the speeding up of Blast searches, but few of them deal with the use of large heterogeneous production Grid Infrastructures. These are the infrastructures that could reach the largest number of resources and the best load balancing for data access. The Grid Service under development will analyse requests based on the number of sequences, splitting them accordingly to the available resources. Lower-level computation will be performed through MPIBLAST. The software architecture is based on the WSRF standard.
A trend in modern medicine is towards individualization of healthcare and, potentially, grid computing can play an important role in this by allowing sharing of resources and expertise to improve the quality of care. In this paper, we present a new test bed, the BIOPATTERN Grid, which aims to fulfil this role in the long term. The main objectives in this paper are 1) to report the development of the BIOPATTERN Grid, for biopattern analysis and bioprofiling in support of individualization of healthcare. The BIOPATTERN Grid is designed to facilitate secure and seamless sharing of geographically distributed bioprofile databases and to support the analysis of bioprofiles to combat major diseases such as brain diseases and cancer within a major EU project, BIOPATTERN (www.biopattern.org); 2) to illustrate how the BIOPATTERN Grid could be used for biopattern analysis and bioprofiling for early detection of dementia and for brain injury assessment on an individual basis. We highlight important issues that would arise from the mobility of citizens in the EU, such as those associated with access to medical data, ethical and security; and 3) to describe two grid services which aim to integrate BIOPATTERN Grid with existing grid projects on crawling service and remote data acquisition which is necessary to underpin the use of the test bed for biopattern analysis and bioprofiling.
This paper describes the development of the NCHC's Severe Acute Respiratory Syndrome (SARS) Grid project—An Access Grid (AG)-based disease management and collaborative platform that allowed for SARS patient's medical data to be dynamically shared and discussed between hospitals and doctors using AG's video teleconferencing (VTC) capabilities. During the height of the SARS epidemic in Asia, SARS Grid and the SARShope website significantly curved the spread of SARS by helping doctors manage the in-hospital and in-home care of quarantined SARS patients through medical data exchange and the monitoring of the patient's symptoms. Now that the SARS epidemic has ended, the primary function of the SARS Grid project is that of a web-based informatics tool to increase pubic awareness of SARS and other epidemic diseases. Additionally, the SARS Grid project can be viewed and further studied as an outstanding model of epidemic disease prevention and/or containment.