The modern HPC centre is a complex entity. A centre easily hosts tens of thousands of processors in the smallest cases, and several million processors for the largest facilities. Before 2020, this number is expected to pass 100 million processors in a single centre alone. Thus, massive parallel processing is everyday business today and the major challenges of running such facilities are well established, with solutions existing for most of these challenges.
In the following we will make a strong distinction between parallelism (e.g. Single Program Multiple Data and Single Instruction Multiple Data) and concurrency (multiple processes, both identical and different) that may run in parallel or by interleaving, depending on timing and available hardware.
While concurrency mechanisms were commonly used to express applications for parallel computers two or three decades ago, this use of concurrency has completely died out. There are several explanations for this, but the most important is that the cost of an HPC installation today is so high that users must document that they use the facility efficiently. All applications must stay above a specified threshold, typically 70%, of CPU utilisation. Even with SPMD type programming, asynchrony amongst processors is a common challenge when trying to stay above this threshold.
This does not mean that concurrency is not relevant in the modern HPC centre. While the actual compute-nodes may not use concurrency, the underlying fabric, that allows those nodes to operate, has a large potential for concurrency. In many cases, these elements could benefit from a formal approach to concurrency control.
In this workshop, we will present challenges in the HPC centre that do depend on concurrency: storage-systems, schedulers, backup-systems, archiving and network-bulk transfers, to name a few. The interesting challenge is that while all these elements require concurrency control to operate correctly and efficiently, they are also highly interdependent – the concurrency aspects must cover the full set of infrastructure components for optimal efficiency. We will seek to describe scenarios for real HPC centres and sketch solutions that are built on a structured concurrency approach. Slides used introducing this workshop can be downloaded from [1].