HealthGrid 2008 (http://chicago2008.healthgrid.org) is the sixth conference in this series of open forums for the integration of grid technologies and its applications in the biomedical, medical and biological domains to pave the path to an international research area in healthgrids. The main objective of the HealthGrid conference and the HealthGrid Association is the exchange and discussion of ideas, technologies, solutions and requirements that interest the grid and the life-sciences communities to foster the integration of grids into health. Participation is encouraged for grid middleware and grid applications developers, biomedical and health informatics users and security and policy makers to participate in a set of multidisciplinary sessions with a common concern on the applications to Health. It marks a new level of maturity for this event, migrating for the first time outside Europe to the city of Chicago. HealthGrid's sister organization, HealthGrid.US, has gathered together an impressive array of grid and biomedical informatics experts from both sides of the Atlantic—and beyond—to its conference in Chicago. This is indeed an auspicious occasion: there are similarities and differences between the European and American approaches, from nomenclature – ‘cyberinfrastructure’ in the US and ‘grids’ in Europe – through the variety of healthcare economies to the very style of biomedical research. Each has the potential to benefit the other, and each has the potential to benefit from the other. The conference is an occasion to celebrate differences and to explore points of contact, just as much as it is an occasion to celebrate similarities and to exploit the contrasts.
HealthGrid conferences have been organized on an annual basis. The first conference, held in 2003 in Lyon (http://lyon2003.healthgrid.org), reflected the need to involve all actors – physicians, scientists and technologists – who might play a role in the application of grid technology to health, whether healthcare or bio-medical research. The second conference, held in Clermont-Ferrand in January 2004 (http://clermont2004. healthgrid.org), reported research and work in progress from a large number of projects. The third conference in Oxford (http://oxford2005.healthgrid.org) had a major focus on the results and deployment strategies in healthcare. The fourth conference in Valencia (http://valencia2006.healthgrid.org) aimed at consolidating the collaboration among biologists, healthcare professionals and grid technology experts. This fifth conference focused on five pre-eminent domains viewed as application areas for grids in the biomedical field: molecules, cells, organs, individuals, and populations and aimed to show potential users that grids had already gone beyond hype to show concrete applications that demonstrate the success of the technology. As befits a diverse community and a maturing technology, the themes in 2008 reflect the diversity of mature practice: Advancing Virtual Communities, offering a glimpse of the kind of communities that are brought together by means of collaboration grids; Public Health Informatics, exploring the diffusion of grid concepts and technologies in health informatics; Tranlational Bioinformatics, the contact point between medicine, healthcare and genomics; and Knowledge Management and Decision Support, one direction that is confidently expected to grow as the synergy of grids and ‘evidence-based practice’ in healthcare is exploited. Thus, the nineteen papers selected from some 40 submissions of papers, demonstrations and posters have been organized in four sections, complemented by a fifth section of research road maps which relate, mostly indirectly, to keynotes and other events at the conference.
In the first section on Advancing Virtual Communities, Andrew Simpson et al. report on the development of a service-oriented interoperability framework, sif, in the context of a broader project, Generic Infrastructure for Medical Informatics (GIMI), which facilitates secure access to data in a variety of forms throughout a collaboration. Andrew Branson et al. address the information integration problem through the provision of an integrated data model with links to and from ontologies to homogenize biomedical data at different levels in the context of the EC Framework 6 Health-e-Child project; they identify clinical requirements and provide a detailed design approach. Nabil Abdennadher et al. expose XWCH, an easy-to-use middleware which they demonstrate can be exploited to gridify applications in such diverse applications as phylogeny inference on one hand and neuron connectivity on the other, thus providing evidence of the adaptability of their approach. M. Diarena et al. describe HOPE, a secure data integration platform taking its inspiration from a number of existing projects and building on EGEE gLite and the metadata catalogue AMGA and the portal GridSphere. Johan Montagnat et al. detail their work in the NeuroLOG project, in which they have assembled an ambitious middleware by analysing past experience and using existing components; they adopt a user-centred perspective in the domain of neuroscience and report on the project's design study and data integration strategy based on local schemas in the various sites.
The section on Public Health Informatics begins with a paper by Mikko Pitkanen et al. on the reasons for limited grid adoption in healthcare settings. Although security considerations and issues of IT management were significant barriers, the authors were pleasantly surprised by the high level of knowledge of grid technologies among those surveyed and also by the number of tentative experiments currently being undertaken. On the other hand, Silvia Olabarriaga et al. focussed on usability, exploring the gap between developers' perceptions and those of users. By taking a retrospective look at various to make results of functional MRI available to scientific users, the authors identify three stages in maturity, from ‘low-hanging fruit’ through ‘trying out’ to ‘end-user ready’. Exposing results from an Ecos-Colciencias project, Alexandra Pomares et al. report from Colombia on an independent approach to data integration in health virtual organizations through a high conceptual level of virtual data objects that can be queried independently of the logical or physical data sources behind them; the approach introduces a number of innovations in ‘query cartography’ and a semantic caching strategy to optimize network performance. Finally in this section, Professor Richard Sinnott et al. address what they describe as ‘arguably the greatest challenge facing the rollout and adoption of grid technologies to meet the changing face of postgenomic clinical research, especially with regard to information governance, ethics and hence security solutions’, namely the data requirements of the clinical domain. They describe how solutions from the Virtual Organizations for Trials and Epidemiological Studies (VOTES) project are being refactored to meet the needs of the clinical domain.
Among the papers on Translational Bioinformatics, the cost of certain bioinformatic problems is growing faster than the resources to address them, both because of the cost of harvesting enough target DNA or because computation is so expensive. The paper by Aparicio et al. introduces the idea of environmental genomic, or metagenomic, studies, in which fragments extracted from a mix of target and environmental DNA are compared to sequences whose function is known as an aid to their classification. The paper describes various steps in the process and suggests optimizations. On the other hand, Mario Cannataro et al. explore the means to optimize protein-to-protein interaction (PPI) prediction. For example, predicting the configuration of a PPI network is one way to study the generation of protein complexes. However, thanks to the enormous number of possible configurations of protein interactions, automatic computation tools are essential. The approach taken here is to integrate the outputs of a number of existing predictors; these are individually highly sensitive to input configuration, but it is suggested that integrated results are stable. Mohammad Shahid et al. report on a sequel to the successful WISDOM malaria challenge (see HealthGrid 2006 and 2007 for reports) by porting the challenge (not the solution) to the VIOLA optical grid environment using Unicore; apart from lessons learned on the use of grids in docking problems, the authors also identify an approach to reduce the size of the compound database in order to improve efficiency. Maria Mirto et al. consider the problem of protein folding or, as it is formally known, the tertiary structure problem: primary structure is the sequence and secondary the alpha-helix or beta-sheet form. Folding, or precise 3-dimensional shape beyond secondary structure, is a compromise of electical forces, geometry, and other constraints, and determines the function of the protein. The ProGenGrid (Proteomics and Genomics Grid) project has implemented a protein tertiary structure prediction service in a grid environment. The service has been used for predicting the dicarboxylate carrier of Saccharomyces cerevisiae by using the homology modelling approach. A virtual reality environment is then used for 3D visualization. The section concludes with Xuan Liu's paper on BioNessieG, a grid version of an existing biochemical network simulator which was developed at the University of Glasgow. The paper describes the simulator and focuses in particular on how it has been extended to benefit from a wide variety of high performance computational resources across the UK through grid technologies to support larger scale simulations.
The fourth section, on Knowledge Management and Decision Support, kicks off with three papers from the @neurIST consortium, an evidently fruitful multidisciplinary project on the risk of aneurysm rupture. This risk in a given patient is determined by a multiplicity of variables, from the molecular level through cellular processes and tissue remodelling to the pathophysiology of the disease, not ignoring population level epidemiological aspects of the disease. In the first paper, Jimison Iavindrasana et al. discuss the management of patient data, with a focus on considerations of patient privacy and confidentiality, as well as the security features of the clinical information system. Christoph Friedrich et al., in the second paper, present @neuLink, a service-oriented environment used to extract relevant information from structured and unstructured information sources and to link genetics (molecular genetics as well as epidemiological factors) with the disease, thus allowing the interpretation of molecular data within a clinical research environment. Through the integration of multiple, complex data sources – clinical data, individual risk factors provided by @neuLink – the @neuRisk clinical decision support (CDS) system classifies patients with high aneurysm rupture risk and proposes suitable, individualized preventive therapy; this is introduced in the third paper by Dunlop et al. Jesus Luna et al., reporting from Cyprus, take grid computing precepts as normally given and analyse what they would mean in a real healthgrid situation. This includes issues of patient data communication through public networks, storage in nodes out of the hospital's control, and so on. Analysis of the Intensive Care Grid system reveals potential sources of attack and provides a solution that would be in line with legal requirements and security mechanisms. Peter Sloot et al. take the problem of decision support in the treatment of HIV drug resistance and explore, through the ViroLab project, the complexity of multilayered, non-linear, multiply-connected networks of influences in the decision process. The data cascades from -omics to health record, ascending or descending physical and temporal scales, disciplines, orders of magnitude. The result is an improved drug ranking system which has been successfully used in its prototype form.
The fifth and final section of this volume presents three research road maps. These are in different stages of development, and were commissioned for different reasons and in different contexts. However, collectively, they represent a remarkable set of proposals with many overlaps and contrasts.
The outcome of an ‘Integrated Research Team’ event on Healthgrid: Grid technology and biomedicine convened by the US Army's Telemedicine & Advanced Technology Research Centre (TATRC), the first road map “serves to communicate expectations and requirements to all parties – end users, policymakers, scientists, as well as technology leaders. The roadmap provides a foundation upon which TATRC may organize its strategic priorities and commit to resource allocations.”
caGrid is a middleware system which combines grid computing, service oriented architecture and the model-driven design paradigm to support development of interoperable data and analytical resources and federation of such resources in a Grid environment. The functionality provided by caGrid is an essential and integral component of the cancer Biomedical Informatics Grid (caBIGTM) program, established by the National Cancer Institute as a nationwide effort to develop enabling informatics technologies for collaborative, multi-institutional biomedical research with the overarching goal of accelerating translational cancer research.
Starting with the HealthGrid White Paper (2005), the EU funded SHARE road map project has aimed at identifying the most important steps and significant milestones towards wide deployment and adoption of healthgrids in Europe. The project has sought to reconcile likely conflicts between technological developments and regulatory frameworks by bringing together the project's technical road map and conceptual map of ethical and legal issues and socioeconomic prospects. A key tool in this process was a collection of case studies of healthgrid applications.