
Ebook: HealthGrid Applications and Technologies Meet Science Gateways for Life Sciences

The integration of grid, cloud and other e-infrastructures into the fields of biology, bioinformatics, biomedicine, and healthcare are crucial if optimum use is to be made of the latest high-performance and distributed computer technology in these areas. Science gateways are concerned with offering intuitive graphical user interfaces to applications, data, and tools on distributed computing infrastructures. This book presents the joint proceedings of the Tenth HealthGrid Conference and the Fourth International Workshop on Science Gateways for Life Sciences (IWSG-Life), held in Amsterdam, Netherlands in May 2012. The HealthGrid conference promotes the exchange and debate of ideas, technologies and solutions likely to promote the integration of grids into biomedical research and health in the broadest sense. The IWSG-Life workshop series is a forum that brings together scientists from the field of life sciences, bioinformatics, and computer science to advance computational biology and chemistry in the context of science gateways. These events have been jointly organized to maximize the benefit from synergies and stimulate the forging of further links in joint research areas. The book is divided into three parts. Part I includes contributions accepted to the HealthGrid conference; Part II contains the papers about various aspects of the development and usage of science gateways for life sciences. The joint session is recorded in Part III, and addresses the topic of science gateways for biomedical research. The book will provide insights and new perspectives for all those involved in the research and use of infrastructures and technology for healthcare and life sciences.
The Tenth HealthGrid Conference and the Fourth International Workshop on Science Gateways for Life Sciences (IWSG-Life) offer a forum to discuss the integration of grid, cloud, and other e-infrastructures into the fields of biology, bioinformatics, biomedicine, and healthcare. The program includes presentations, demos, and tutorials on a wide range of topics from technologies to biomedical research, and from portals to workflow and computational modeling.
The principal objective of the HealthGrid conference is the exchange and debate of ideas, technologies, solutions, and requirements that interest the grid and life science communities and are likely to promote the integration of grids into biomedical research and health in the broadest sense. In 2012 the HealthGrid conference celebrates its tenth edition.
IWSG-Life is a workshop series that focuses on research contributions for science gateways and tools in the field of life sciences. It brings together scientists from the fields of life sciences, bioinformatics, and computer science. It therefore forms an international forum to exchange experience, formulate ideas, and catch up on technological advances in computational biology and chemistry in the context of science gateways.
The communities of both events overlap and in 2012 the events have been jointly organized such that attendees can benefit from synergies and will be stimulated to forge further links in their research areas. These proceedings record these events, their topics, and the peer reviewed papers and abstracts. Part I includes the contributions accepted to the HealthGrid conference in the form of oral paper presentations, tutorials, and demonstrations. Part II contains the papers about various aspects of the development and usage of science gateways for life sciences. The joint session is represented by Part III, which addresses the topic of science gateways for biomedical research.
We would like to thank the authors for their contributions and the committees for their efforts in organizing the events and reviewing the submissions. We would also like to express our gratitude to the local organizers; without their hard work it would not have been possible to make the events a success.
Not least we would like to acknowledge all our sponsors for their financial and/or media contributions. These are, in alphabetical order: AMC, BiG Grid, DANS, EGI.eu, IOS Press, iSGTW, NBIC, NICE, and SCI-BUS.
The Editors
This paper presents a study of the performance of federated queries implemented in a system that simulates the architecture proposed for the Scalable Architecture for Federated Translational Inquiries Network (SAFTINet). Performance tests were conducted using both physical hardware and virtual machines within the test laboratory of the Center for High Performance Computing at the University of Utah. Tests were performed on SAFTINet networks ranging from 4 to 32 nodes with databases containing synthetic data for several million patients. The results show that the caGrid FQE (Federated Query Engine) is capable and suitable for comparative effectiveness research (CER) federated queries given its nearly linear scalability as partner nodes increase in number. The results presented here are also important for the specification of the hardware required to run a CER grid.
Progress in our understanding of brain disorders increasingly relies on the costly collection of large standardized brain magnetic resonance imaging (MRI) data sets. Moreover, the clinical interpretation of brain scans benefits from compare and contrast analyses of scans from patients with similar, and sometimes rare, demographic, diagnostic, and treatment status. A solution to both needs is to acquire standardized, research-ready clinical brain scans and to build the information technology infrastructure to share such scans, along with other pertinent information, across hospitals. This paper describes the design, deployment, and operation of a federated imaging system that captures and shares standardized, de-identified clinical brain images in a federation across multiple institutions. In addition to describing innovative aspects of the system architecture and our initial testing of the deployed infrastructure, we also describe the Standardized Imaging Protocol (SIP) developed for the project and our interactions with the Institutional Review Board (IRB) regarding handling patient data in the federated environment.
High-resolution digital imaging is enabling digital archiving and sharing of digitized microscopy slides and new methods for digital pathology. Collaborative research centers, outsourced medical services, and multi-site organizations stand to benefit from sharing pathology data in a digital pathology network. Yet significant technological challenges remain due to the large size and volume of digitized whole slide images. While information systems do exist for managing local pathology laboratories, they tend to be oriented toward narrow clinical use cases or offer closed ecosystems around proprietary formats. Few solutions exist for networking digital pathology operations. Here we present a system architecture and implementation of a digital pathology network and share results from a production system that federates major research centers.
The discovery of knowledge from raw data is a multistage process, that typical requires collaboration between experts from disparate disciplines, and the application of a range of methods tailored to the research question. The aim of the eLab is to provide a web-based environment for health professionals and researchers to access health datasets, share knowledge and expertise and to apply methods for analysis and visualization of the results. The eLab is built around the core concept of the Research Object as the mechanism for preserving, reusing and disseminating the knowledge discovery process. The possible range of applications of the eLab is vast, and so the consideration of the trade off between specificity and generality is an important one, that is reflected in the requirements. The architecture and implementation of the eLab is described, and we report on the deployment of eLabs for applications in primary care, long-term conditions management, bariatric surgery and public health.
European laws on privacy and data security are not explicit about the storage and processing of genetic data. Especially whole-genome data is identifying and contains a lot of personal information. Is processing of such data allowed in computing grids? To find out, we looked at legal precedents in related fields, current literature, and interviews with legal experts. We found that processing of genetic data is only allowed on distributed systems with specific security measures, both technical and organizational. Informed consent, although important, offers no substitute for such requirements.
One of the important questions in biological evolution is to know if certain changes along protein coding genes have contributed to the adaptation of species. This problem is known to be biologically complex and computationally very expensive. It, therefore, requires efficient Grid or cluster solutions to overcome the computational challenge. We have developed a Grid-enabled tool (gcodeml) that relies on the PAML (codeml) package to help analyse large phylogenetic datasets on both Grids and computational clusters. Although we report on results for gcodeml, our approach is applicable and customisable to related problems in biology or other scientific domains.
Notwithstanding the benefits of distributed-computing infrastructures for empowering bioinformatics analysis tools with the needed computing and storage capability, the actual use of these infrastructures is still low. Learning curves and deployment difficulties have reduced the impact on the wide research community. This article presents a porting strategy of BLAST based on a multiplatform client and a service that provides the same interface as sequential BLAST, thus reducing learning curve and with minimal impact on their integration on existing workflows. The porting has been done using the execution and data access components from the EC project Venus-C and the Windows Azure infrastructure provided in this project. The results obtained demonstrate a low overhead on the global execution framework and reasonable speed-up and cost-efficiency with respect to a sequential version.
Production operation of large distributed computing infrastructures (DCI) still requires a lot of human intervention to reach acceptable quality of service. This may be achievable for scientific communities with solid IT support, but it remains a show-stopper for others. Some application execution environments are used to hide runtime technical issues from end users. But they mostly aim at fault-tolerance rather than incident resolution, and their operation still requires substantial manpower. A longer-term support activity is thus needed to ensure sustained quality of service for Virtual Organisations (VO). This paper describes how the biomed VO has addressed this challenge by setting up a technical support team. Its organisation, tooling, daily tasks, and procedures are described. Results are shown in terms of resource usage by end users, amount of reported incidents, and developed software tools. Based on our experience, we suggest ways to measure the impact of the technical support, perspectives to decrease its human cost and make it more community-specific.
Scientific research has become very data and compute intensive because of the progress in data acquisition and measurement devices, which is particularly true in Life Sciences. To cope with this deluge of data, scientists use distributed computing and storage infrastructures. The use of such infrastructures introduces by itself new challenges to the scientists in terms of proper and efficient use. Scientific workflow management systems play an important role in facilitating the use of the infrastructure by hiding some of its complexity. Althought most scientific workflow management systems are provenance-aware, not all of them come with provenance functionality out of the box. In this paper we describe the improvement and integration of a provenance system into an e-infrastructure for biomedical research based on the MOTEUR workflow management system. The main contributions of the paper are: presenting an OPM implementation using relational database backend for the provenance store, providing an e-infrastructure with a comprehensive provenance system, defining a generic approach to provenance implementation, potentially suitable for other workflow systems and application domains and demonstrating the value of this system based on use cases presenting the provenance data through a user-friendly web interface.
This documents shortly describes the background and structure of the academic high-performance cloud computing tutorial at the Healthgrid conference.
Neuroimaging is a field that benefits from distributed computing infrastructures (DCIs) to perform data- and compute-intensive processing and analysis. Using grid workflow systems not only automates the processing pipelines, but also enables domain researchers to implement their expertise on how to best process neuroimaging data. To share this expertise and to promote collaborative research in neurosciences, ways to facilitate the exchange, re-use, and interoperability of workflow applications between different groups are required. The SHIWA project (SHaring Interoperable Workflows for large-scale scientific simulations on Available DCIs) is specifically addressing such use-cases, building a generic platform to facilitate workflow exchange and execution environments interoperability. The goal is to facilitate the dissemination and execution of workflows by diverse workflow management systems on multiple DCIs. This platform enables researchers to gain access to a variety of ready-to-use workflows, to reuse workflows developed by collaborators, to publish their own workflows to be used by others, and to use additional resources from external DCIs to run workflows.
In this demonstration we present how the SHIWA platform is used to implement various usage scenarios in which workflow exchange supports collaboration in neuroscience. The SHIWA platform and the implemented solutions are presented from the user perspective, in this case the workflow developers and the neuroscientists. These workflow interoperability solutions aim to facilitate and enable more advanced and large-scale research in neuroscience.
The demonstration will focus on usage scenarios currently employed to exchange, combine and interoperate neuroimaging workflows between Academic Medical Center, Amsterdam, the Charite Universittsmedizin, Berlin and the outGrid infrastructure. These workflows are developed for the analysis of neuroimaging data, in particular Magnetic Resonance Images (MRI) and Diffusion Tensor Imaging (DTI). Each group has ported workflows with complementary and overlapping functions to its Grid infrastructure using different workflow systems, so the goal is to combine and share them across the boundaries of the original DCIs.
The following usage scenarios will be addressed in the demonstration:
• Preparing the workflow for sharing with others (VO, workflow management system dependencies);
• Using the SHIWA repository for publishing workflows to share a new workflow with other potential users;
• Finding a workflow in the SHIWA repository and testing it with sample data;
• Running the workflow found in the repository with own data using the SHIWA simulation platform;
• Combining complementary workflows into a meta-workflow to implement additional functionality or combining different implementations of the same workflow to compute on different DCIs simultaneously
The shown solution includes neuroimaging workflows using the GWES, MOTEUR, LONI pipeline and P-GRADE workflow engines submitting jobs to the German MediGRID, the Dutch BiG Grid, the European EGI, and the international outGrid infrastructures. Data to be accessed might be stored locally, on an iRODS data management system, in the LFC file catalog and on gridFTP-enabled sites. The user-interfaces are web-based and include a Liferay-based Grid portal and a P-GRADE workflow editor implemented as webstart application. The neuroimaging applications include self-developed tools for preprocessing DTI data and widely used methods from the ITK and the FSL toolboxes.
In the proposed demonstration we will present DCV (Desktop Cloud Visualization): a unique technology that allows users to remote access 2D and 3D interactive applications over a standard network. This allows geographically dispersed doctors work collaboratively and to acquire anatomical or pathological images and visualize them for further investigations.
In this paper we present the architecture of a framework for building Science Gateways supporting official standards both for user authentication and authorization and for middleware-independent job and data management. Two use cases of the customization of the Science Gateway framework for Semantic-Web-based life science applications are also described.