Ebook: Future Trends of HPC in a Disruptive Scenario
The realization that the use of components off the shelf (COTS) could reduce costs sparked the evolution of the massive parallel computing systems available today. The main problem with such systems is the development of suitable operating systems, algorithms and application software that can utilise the potential processing power of large numbers of processors. As a result, systems comprising millions of processors are still limited in the applications they can efficiently solve. Two alternative paradigms that may offer a solution to this problem are Quantum Computers (QC) and Brain Inspired Computers (BIC).
This book presents papers from the 14th edition of the biennial international conference on High Performance Computing - From Clouds and Big Data to Exascale and Beyond, held in Cetraro, Italy, from 2 ¬- 6 July 2018. It is divided into 4 sections covering data science, quantum computing, high-performance computing, and applications. The papers presented during the workshop covered a wide spectrum of topics on new developments in the rapidly evolving supercomputing field – including QC and BIC – and a selection of contributions presented at the workshop are included in this volume. In addition, two papers presented at a workshop on Brain Inspired Computing in 2017 and an overview of work related to data science executed by a number of universities in the USA, parts of which were presented at the 2018 and previous workshops, are also included.
The book will be of interest to all those whose work involves high-performance computing.
The biennial research workshop on High Performance Computing was held in Cetraro in July 2018. The papers presented during the workshop covered a wide spectrum of topics on new developments in the rapidly evolving supercomputing field. A selection of contributions presented at the workshop are included in this volume. In addition two papers presented at a workshop on Brain Inspired Computing in 2017 are also included. An overview of work related to Data Science, executed by a number of universities in the USA, and of which parts were presented at the 2018 as well as previous workshops, is also included.
That the applicability of Moore’s conjecture must be limited in its time span due to physical limitations was known from the outset. On the one hand this resulted already in the 1970s in the development of parallel computers, which followed the idea that many processors working on the same problem may in some cases, at least, significantly reduce the time to solution. Initially parallel processors were special designs, such as vector processors, catering for particular algorithm paradigms. These specialised machines were expensive due to their limited numbers. It was the realisation that the use of COTS (Components Off The Shelf) could reduce costs that started the evolution of the massively parallel systems available today. The main problem with such systems was the development of suitable operating systems, algorithms and application software.
The position today is that systems comprising millions of processors are limited in the applications they can efficiently solve, due to the restrictions in developing suitable parallel application software. This triggered a number of developments in order to develop alternate compute paradigms that may offer alternate approaches to solving complex and compute intensive problems. Two of these are Quantum Computers (QC) and Brain Inspired Computers (BIC).
In this book a number of papers discussing various possibilities of such new approaches to computing are included.
Lucio Grandinetti, Italy
Gerhard Joubert, Germany
Kristel Michielsen, Germany
Seyedeh Leili Mirtaheri, Iran
Michela Taufer, USA
Rio Yokota, Japan
Date: 2019-06-15
The convergence between HPC and Big Data processing can be pursued also providing high-level parallel programming tools for developing Big data analysis. Software systems for social data mining provide algorithms and tools for extracting useful knowledge from user-generated social media data. ParSoDA (Parallel Social Data Analytics) is a high-level library for developing data mining applications on HPC systems based on the extraction of useful knowledge from large dataset gathered from social media. The library aims at reducing the programming skills needed for implementing scalable social data analysis applications. To reach this goal, ParSoDA defines a general structure for a social data analysis application that includes a number of configurable steps and provides a predefined (but extensible) set of functions that can be used for each step. User applications based on the ParSoDA library can be run on both Apache Hadoop and Spark clusters. The goal of this paper is to assess the flexibility and usability of the ParSoDA library. Through some code snippets, we demonstrate how programmers can easily extend ParSoDA functions on their own if they need any custom behavior. Concerning the usability, we compare the programming effort required for coding a social media application using or not using the ParSoDA library. The comparison shows that ParSoDA leads to a drastic reduction (i.e., about 65 %) of lines of code, since the programmer only has to implement the application logic without worrying about configuring the environment and related classes.
Internet, media, mobile devices, and sensors continuously collect massive amounts of data. Learning from this data gives improvements in science and quality of life. Big Data is a big blessing; that also presents big challenges arising from its inherent characteristics, namely volume, variety and velocity. Big Data is impossible to analyze by using traditional central methods and therefore, distributed processing with parallelization is needed. Data analytics often must be performed real-time or near real time. Gaining an answer to the analysis demands on almost real-time, is almost preferred to a precise decision but in a timely manner. Optimization algorithms for Big Data aim to reduce the computational, storage, and communications challenges. The data and parameter sizes of Big Data optimization problems are too large to process locally and since the Big Data models are inexact, optimization algorithms no longer need to find the high accuracy solutions. In this paper, we provide an overview of this emerging field; describe optimization methods used for Big Data Analytics (BDA) like first-order methods, randomization, heuristic, evolutionary and convex algorithms.
Our project is at the interface of Big Data and HPC – High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several different application areas or communities driving the requirements for software systems and algorithms. We describe the base architecture, including the HPC-ABDS, High-Performance Computing enhanced Apache Big Data Stack, and an application use case study identifying key features that determine software and algorithm requirements. We summarize middleware including Harp-DAAL collective communication layer, Twister2 Big Data toolkit, and pilot jobs. Then we present the SPIDAL Scalable Parallel Interoperable Data Analytics Library and our work for it in core machine-learning, image processing and the application communities, Network science, Polar Science, Biomolecular Simulations, Pathology, and Spatial systems. We describe basic algorithms and their integration in end-to-end use cases.
End-to-end scientific workflows running in leadership class systems present significant data management challenges due to the increasing volume of data being produced. Furthermore, the impact of emerging storage architectures (e.g., deep memory hierarchies and burst buffers) and the extreme heterogeneity of the system are bringing new data management challenges. Together these data-related challenges are significantly impacting the effective execution of coupled simulations and in-situ workflows on these systems. Increasing systems scales are also expected to result in an increase in node failures and silent data corruptions, which adds to these challenges. Data staging techniques are being used to address these data-related challenges and support extreme scale in-situ workflows.
In this paper, we investigate how data staging solutions can leverage deep memory hierarchy via intelligent prefetching and data movement and efficient data placement techniques. Specifically, we present an autonomic data-management framework that leverages system information, data locality, machine learning based approaches, and user hints to capture the data access and movement patterns between components of staging-based in-situ application workflows. It then uses this knowledge to build a more robust data staging platform, which can provide high performance and resilient/error-free data exchange for in-situ workflows. We also present an overview of various data management techniques used by the DataSpaces data staging service that leverage autonomic data management to deliver the right data at the right time to the right application.
Results of benchmarking tests on gate-based quantum computers and quantum annealers are compared to results obtained with simulators of these new computing technologies. Simulating the behavior of (physical models) of quantum computing devices on supercomputers does not only shed light on the physical processes that are involved in the computation but is also useful for testing some applications.
Commercial quantum optimizers have been unable so far to demonstrate scalable performance advantages over standard state-of-the-art optimization algorithms. Multiple independent efforts continue to develop the technology, with the belief that larger optimizers with more coherent and densely connected qubits will be necessary to observe a quantum speedup. We point to two fundamental limitations of physical quantum annealing optimizers that we argue prevents them from functioning as scalable optimizers: their finite temperature and their analog nature. We numerically demonstrate that for quantum annealers to find the minimizing configurations of optimization problems of increasingly larger sizes, their temperature and noise levels must be appropriately scaled down with problem size. We discuss the implications of our results to practical quantum annealers.
For the last few years, the NASA Quantum Artificial Intelligence Laboratory (QuAIL) has been performing research to assess the potential impact of quantum computers on challenging computational problems relevant to future NASA missions. A key aspect of this research is devising methods to most effectively utilize emerging quantum computing hardware. Research questions include what experiments on early quantum hardware would give the most insight into the potential impact of quantum computing, the design of algorithms to explore on such hardware, and the development of tools to minimize the quantum resource requirements. We survey work relevant to these questions, with a particular emphasis on our recent work in quantum algorithms and applications, in elucidating mechanisms of quantum mechanics and their uses for quantum computational purposes, and in simulation, compilation, and physics-inspired classical algorithms. To our early application thrusts in planning and scheduling, fault diagnosis, and machine learning, we add thrusts related to robustness of communication networks and the simulation of many-body systems for material science and chemistry. We provide a brief update on quantum annealing work, but concentrate on gate-model quantum computing research advances within the last couple of years.
As clock speeds have stagnated, the number of cores in a node has been drastically increased to improve processor throughput. Most scalable system software was designed and developed for single-threaded environments. Multithreaded environments become increasingly prominent as application developers optimize their codes to leverage the full performance of the processor; however, these environments are incompatible with a number of assumptions that have driven scalable system software development.
This paper will highlight a case study of this mismatch focusing on MPI message matching. MPI message matching has been designed and optimized for traditional serial execution. The reduced determinism in the order of MPI calls can significantly reduce the performance of MPI message matching, potentially overtaking time-per-iteration targets of many applications. Different proposed techniques attempt to address these issues and enable multithreaded MPI usage. These approaches highlight a number of tradeoffs that make adapting MPI message matching complex. This case study and its proposed solutions highlight a number of general concepts that need to be leveraged in the design of next generation scaleable system software.
HPC is entering a point of singularity where previous technology trends (Moore’s Law etc.) are terminating and dramatic performance progress may depend on advances in computer architecture outside of the scope of conventional practices. This may extend to the opportunities potentially available through the context of non-von Neumann architectures. Curiously, this is not a new field but suffered from the relatively easy growth potential powered by decades of Moore’s Law including resulting improvements in device density and clock rates. Cellular automata, static and dynamic data flow, systolic arrays, and neural nets have demonstrated alternative approaches to von Neumann derivative architectures throughout past decades, each exhibiting unique advantages but also imposing open challenges and time to delivery. A new class of non von Neumann architecture is being pursued and recent scaling studies suggest that its genus or structure, called here “Continuum Computer Architecture (CCA)”, has the possibility to scale many orders of magnitude beyond present day HPC systems. Further, by incorporating select mechanisms for the purpose, it may greatly enhance dynamic graph processing even further. This presentation will describe elements of this study on the scaling of CCA and suggest with a change in enabling technology towards the latter half of the next decade may yield at least peak capabilities of Zettaflops and beyond at practical power, size, and cost.
Increasing smart and connected devices in Internet of things (IoT), many connected devices send processing requests to cloud computing systems. Cloud computing introduces latency in communication and data transfer, especially in realtime and time sensitive processings. Fog computing is a promising solution to improve the efficiency and reduce the data volume transported to the cloud involving computing and storage capabilities of edge in an efficient manner. Distributed nature of Fog computing require to implement of efficient distributed resource management systems. In this paper we address resource management in fog computing and try to find the shortest route to resources in a distributed manner by applying ant colony optimization (ACO). In details, we apply swarm intelligence feature of ant colony and its combination with traveling salesman to find the shortest route. We further propose the parameter optimization techniques to improve the performance of ACO. We evaluate the performance of proposed algorithm using computer simulations. The simulation results confirm the effectiveness of proposed method.
Mapping the microscopical organization of the human cerebral cortex provides a basis for multimodal brain atlases, and is indispensable for allocating functional imaging, physiological, connectivity, molecular, or genetic data to anatomically well specified structural entities of human brain organization at micrometer resolution. The analysis of histological sections is still considered a “gold standard” in brain mapping, and compared with other maps, e.g. from neuroimaging studies [1]. But while the spatial patterns of neuronal cells are inherently three-dimensional, such microscopic analysis is usually performed in individual 2D sections. Here we propose an HPC-based workflow that aims to recover the three-dimensional context from a stack of histological sections stained for neuronal cell bodies, imaged under a light microscope. Our aim is to align image data in consecutive sections at the micrometer resolution, where the texture is dominated by small objects like cell bodies, that often do not extend across sections. Therefore we cannot apply classical intensity-based image registration, where similarity of neighboring images is optimized at the pixel level. Our main contribution is a procedure to explicitly detect and match vessel-like structures in the brain tissue, guiding a feature-based image registration algorithm to 3D reconstruct regions of interest in the brain and recover the distribution of neuronal cells. To replace erroneous information in corrupted tissue areas, we further propose a simple predictive algorithm which generates realistic cell detections by learning from intact tissue parts in the local surroundings.
3D-Polarized Light Imaging (3D-PLI) is a neuroimaging technique used to study the structural connectivity of the human brain at the meso- and microscale. In 3D-PLI, the complex nerve fiber architecture of the brain is characterized by 3D orientation vector fields that are derived from birefringence measurements of unstained histological brain sections by means of an effective physical model.
To optimize the physical model and to better understand the underlying microstructure, numerical simulations are essential tools to optimize the used physical model and to understand the underlying microstructure in detail. The simulations rely on predefined configurations of nerve fiber models (e.g. crossing, kissing, or complex intermingling), their physical properties, as well as the physical properties of the employed optical system to model the entire 3D-PLI measurement. By comparing the simulation and experimental results, possible misinterpretations in the fiber reconstruction process of 3D-PLI can be identified. Here, we focus on fiber modeling with a specific emphasize on the generation of dense fiber distributions as found in the human brain’s white matter. A new algorithm will be introduced that allows to control possible intersections of computationally grown fiber structures.
In recent years an extensive use of social networking platforms has been registered, coupled with the increasing popularity of wearable devices, which is expected to double within the next four years (Source: Gartner August 2017). Physical tracking activities and the publishing of peoples own images, emoticons, audio files and texts on social platforms have become daily practices, with an increase in the availability of data, and therefore potential information, for each user. To extract knowledge from this data, new computational technologies such as Sentiment Analysis (SA) and Affective Computing (AC) have found applications in fields such as marketing, politics, social sciences, cognitive sciences, medical sciences, etc. Such technologies aim to automatically extract emotions from heterogeneous data sources such as text, images, audio, video, and a plethora of biosignals such as voice, facial expression, electroencephalographic signals (EEG), gestures, etc. and find application in various fields. The paper introduces main concepts of Sentiment Analysis and Affective Computing and presents an overview of the primary methodologies and techniques used to recognize emotions from the analysis of various data sources such as text, images, voice signals, EEG. Finally, the paper discusses various applications of those techniques to neurosciences and underlines the high-performance issues of SA and AC, as well as challenges and future trends.