Ebook: Communicating Process Architectures 2017 & 2018
Concurrent and parallel systems are intrinsic to the technology which underpins almost every aspect of our lives today. This book presents the combined post-proceedings for two important conferences on concurrent and parallel systems: Communicating Process Architectures 2017, held in Sliema, Malta, in August 2017, and Communicating Process Architectures 2018, held in Dresden, Germany, in August 2018.
CPA 2017: Fifteen papers were accepted for presentation and publication, they cover topics including mathematical theory, programming languages, design and support tools, verification, and multicore infrastructure and applications ranging from supercomputing to embedded. A workshop on domain-specific concurrency skeletons and the abstracts of eight fringe presentations reporting on new ideas, work in progress or interesting thoughts associated with concurrency are also included in these proceedings.
CPA 2018: Eighteen papers were accepted for presentation and publication, they cover topics including mathematical theory, design and programming language and support tools, verification, multicore run-time infrastructure, and applications at all levels from supercomputing to embedded. A workshop on translating CSP-based languages to common programming languages and the abstracts of four fringe presentations on work in progress, new ideas, as well as demonstrations and concerns that certain common practices in concurrency are harmful are also included in these proceedings.
The book will be of interest to all those whose work involves concurrent and parallel systems.
Preface CPA 2017
Communicating Process Architectures (CPA) 2017 is the thirty-ninth in the WoTUG series of conferences and took place from Sunday August 20th to Wednesday August 23th 2017, hosted by Dr. Kevin Vella of the University of Malta at the Victoria Hotel in Sliema, Malta.
The keynote talk was given by Dr. Peter Welch from the University of Kent at Canterbury, and it was titled “A Workflow Methodology for Realising Concurrent and Verified Systems”. The keynote was co-authored by Dr. Jan B. Pedersen from the University of Nevada Las Vegas. Dr. Welch has been a professor at the University of Kent for decades but is now retired. During his tenure he was a driving force behind the continued development of the occam-π language as well as the JCSP process-oriented programming library for Java. Furthermore, he has been, and continues to be, the chairman of WOTUG, the organization behind this conference. Dr. Pedersen is an associate professor at the University of Nevada specialising in languages, compilers and parallel and process-oriented programming. He is the developer of the ProcessJ process-oriented language.
Fifteen papers were accepted for presentation at the conference, following the strong editorial process developed and refined by CPA over many years. They cover a spectrum of concurrency concerns: mathematical theory, programming languages, design and support tools, verification, multicore infrastructure and applications ranging from supercomputing to embedded. One workshop on domain-specific concurrency skeletons was also given. Furthermore, eight fringe presentations were given over two evening sessions these sessions are used for reporting on new ideas, work in progress or simply interesting thoughts associated with concurrency.
The workshop position papers and fringe abstracts are included in these Proceedings. Altogether, contributions came from nine different countries: Denmark, United Kingdom, Germany, United States, Spain, Slovakia, the Netherlands, Malta, and Norway.
This year saw a new version of the conference webpage, namely, http://cpaconf.org, where papers and resources will be available.
We would like to thank everyone who submitted papers, the reviewers for all their hard work, the editors for working with the authors to make all papers as good as possibly as well as all the delegates attending. Finally, a thank you to Kevin Vella from the University of Malta for putting together and hosting the conference in sunny Malta. Thank you all very much.
Jan Bækgaard Pedersen (University of Nevada Las Vegas),
Kevin Chalmers (Edinburgh Napier University),
Jan F. Broenink (University of Twente),
Brian Vinter (University of Copenhagen),
Kevin Vella (University of Malta),
Peter H. Welch (University of Kent).
Preface CPA 2018
Communicating Process Architectures (CPA) 2018 is the fortieth in the WoTUG series of conferences and took place from Sunday August 19th to Wednesday August 22nd 2018, hosted by the Technische Universität Dresden, Dresden, Germany. This sees the first visit to Germany for this conference. The conference was hosted by Professor Dr. Rainer Spallek, chair of VLSI Design, Diagnostics and Architecture at the Faculty of Computer Science, Technische Universität Dresden, Germany.
Prof. Dr.-Ing. habil. Christian Mayr from the VLSI-Systems and Neuromorphic Circuits group, also at the Technische Universität Dresden gave this years keynote. The talk was titled: “SpiNNaker-2: a 10 Million Core Processor System for Machine Learning and Brain Simulation”, and describes the development of the SpiNNaker-2 processor platform. The SpiNNaker-2 system combines highly efficient machine learning, bio-inspired signal processing at millisecond latency and ultra-low energy operation. Professor Mayr has been at TUD since August 2015, and before this, he completed a post doc at ETH in Zürich, Switzerland. His Ph.D. was granted by Technische Universität Dresden in 2008.
Eighteen papers were accepted for presentation at the conference and publication in these proceedings, following the strong editorial process developed and refined by CPA over many years. They cover a full spectrum of concurrency concerns from mathematical theory, design and programming language and support tools, verification, multicore run-time infrastructure through to applications at all levels from supercomputing to embedded. In addition, a workshop on translating CSP-based languages to common programming languages was given. Papers were received from Norway, Denmark, United Kingdom, United States, the Netherlands, South Africa, France, and Germany. As usual, the conference also hosted fringe sessions presenting work in progress, new ideas, demonstrations and concerns that certain common practices in concurrency are harmful. This year, we had four such fringe presentations and their abstracts can be found in these proceedings.
Finally, we thank everyone who submitted papers, the reviewers for their conscientious and very hard work, the delegates attending for their contributions that make it all worthwhile, and the always present faculty secretaries plus conference officer Oliver Knodel who made it work. A special thank you goes to Uwe Mielke, our local contact in Dresden, who, not only proposed the conference, but also did all the hard work arranging it with TUD as well as many preparations and more. Thank you all very much.
Jan Bækgaard Pedersen (University of Nevada Las Vegas).
Kevin Chalmers (Edinburgh Napier University).
Marc L. Smith (Vassar College).
Kenneth Skovhede (University of Copenhagen).
Brian Vinter (University of Copenhagen).
Peter H. Welch (University of Kent).
Rainer Spallek (Technische Universität Dresden).
Concurrency is beginning to be accepted as a core knowledge area in the undergraduate CS curriculum – no longer isolated, for example, as a support mechanism in a module on operating systems or reserved as an advanced discipline for later study. Formal verification of system properties remains considered an advanced and difficult subject area, requiring significant mathematical knowledge and generally restricted to smaller systems employing sequential logic only. Our experience, from over 30 years of teaching concurrency and formal methods, is that when both are presented together, there is a happy symbiosis providing strong mutual support that transforms both disciplines from advanced and hard to basic and simple. Of course, the model of concurrency and formal methods presented must be simple, powerful and consistent with each other … and, of course, we are talking about process orientation and process algebra.
As with many other disciplines, it is important to use a well-defined methodology that shows how aspects of the disciplines relate to each other and how work flows between them. We introduce a workflow methodology for the mutual development and verification of concurrent systems that serves as a good foundation for students and novices – as well as mature software engineers – learning to integrate formal verification into their development cycle. Such a workflow methodology should be a fundamental element of any future software engineer's tool kit – to be used, together with established software engineering techniques, every day as a matter of course. Concurrency and verification can and should live in symbiosis. By following a well-defined workflow that makes this explicit, major benefits (such as faster development, right-first-time and easier maintenance) can be realised.
This talk will present an approach developed over many years at Kent and UNLV for teaching concurrency with verification and share some of our materials and experiences.
This paper explores how a skeleton based approach can be used to perform big data analysis. We introduce a restricted storage system based on blocks with a fixed maximum size. The storage design removes the residual data problem commonly found in storage systems, and enables processing on individual blocks. We then introduce a stream-oriented query system that can be used on top of the distributed storage system. The query system is built on a limited number of core operations. Each of the perform a specified function, such as filtering elements, but are skeleton operations where the programmer needs to fill in how to perform the operation. The operations are designed to allow splitting across the blocks in the storage system, giving concurrent execution while maintaining a completely sequential program description. To assist in understanding the data flow, we also introduce a graphical representation for each of the methods, enabling a visual expression of an algorithm. To evaluate the query system we implement a number of classic Big-Data queries and show how to implement them with code, and how the queries can be visualized with the graphical representation.
Mutex-based implementations of synchronous channels are slow. This work investigates atomic operations as a technique to improve communication efficiency between two threads via a busy channel. Such a channel provides faster communication than a mutex-based one where the number of threads is identical to the hardware concurrency. To evaluate communication performance, a communication time benchmark is used alongside a selection time benchmark. The communication time benchmark is scaled to evaluate the impact of thread counts greater than the available hardware. Results show that an above ten-times improvement in communication time is possible when the hardware supports the threads fully. The improvement drops as further threads are added to the system due to operating system scheduling taking over the determination of thread activeness. Selection time similarly shows improvement when thread count matches hardware, but likewise reduces as thread count increases. We can conclude that a busy channel is useful in an environment where the thread count matches the available hardware, which is of interest to parallel application developers or control systems with similar properties.
Performance is important when creating large experiments or simulations. However it would be preferable not to lose programmer productivity. A lot of effort has already been put into creating fast libraries for for example linear algebra based computations (BLAS and LAPACK). In this paper, we show that utilizing these libraries in a DSL made for productivity will solve both problems. This is done via automatic code generation and can be extended to other languages, libraries, and features.
Cloud computing is the infrastructure of choice for compute–intensive and data–intensive systems providing a flexible resources for software applications; that is, the processing capacity assigned to an application can be adapted to its needs. Nevertheless, in a cloud pay–per–use model, the number of demanded resources must be taken into account in order to minimise the costs. Our main goal is to reason about a cloud–aware application's resource usage by means of our Timed Process Algebra called BTC (Bounded True Concurrency), that is, to study the trade–offs between an application's response time and resource usage.
Video encoders are software applications that need a lot of resources and work on files of considerable size. Therefore, it seems reasonable to try to take advantage of the capacity offered by cloud computing to accelerate the coding process. The H.264 standard is a wide–spread encoding solution, although other standards are being developed and tested to be the former's successors, such as H.265 or HEVC.
In this paper, the video encoder H.265 will be adapted following the Map/Reduce paradigm in order to be able to be executed in Hadoop. Then, its algebraic formalization will be developed by BTC and validated on a real private cloud environment. The formal model is validated analysing the results of an H.265 encoding Hadoop application and the results show a good fit between the prediction and the observed execution time. Finally, we will carry out a performance evaluation using a a tool called BAL (no acronym) that we have developed for that purpose.
Exponential growth in scientific data set sizes and corresponding computation needs, forces scientists and engineers to structure and automate experiments in workflows running on distributed architectures. In eScience the flows typically evolve gradually from intensive experimentation and often involve multiple participants from separate organisations. Thus, there's a need for infrastructures supporting such highly dynamic and collaborative workflows. Despite much attention to scientific workflows in recent years, most existing systems tend to be single-user top-down approaches, which are inherently best suited for static and fixed flows with all steps known up front. In this work we introduce a simple general rule-based model for event-driven workflows based on data change triggers. A bottom-up workflow approach, that enables a high level of automation and allows dynamically changing flows – with or without manual user interaction. It is realised with an implementation on top of the Minimum intrusion Grid (MiG), which helps de-couple workflow design from underlying execution concerns, and provides built-in collaboration and sharing across organisation boundaries. However, the model itself applies to much wider range of scenarios, and other such possible implementation methods are briefly outlined.
Reading and writing is modelled in CSP using actions containing the symbols ? and !. These reading actions and writing actions are synchronous, and there is a one-to-one relationship between occurrences of pairs of these actions. In the CPA conference 2016, we introduced the half-synchronous alphabetised parallel operator X
This paper explains concurrency issues present when modeling software systems using ordinary place transition nets. Normally it is assumed that ordinary (elementary) Petri net structures are concurrent or concurrency free. However it can be shown that concurrency does not depend only on the structure of a Petri net but on the distribution of resources in the net. A basic algebraic notation is used for representing concurrency. Examples are given showing different types of concurrency that can be detected in ordinary Petri net structures. These are classified into i) dependent and ii) temporal (weak) concurrency.
This work investigates a range of techniques for automatically mapping CSP process networks onto an MPI cluster. A CSP library was developed to provide the necessary functionality for implementing CSP-based concurrent applications on top of MPI. The library enables seamless communication between processes on the same node and processes across different nodes. A new configuration language was implemented to provide a straightforward way to map processes onto cluster nodes. This was designed in such a way to allow for mapping the same application using different mapping algorithms without having to recompile the application. The resulting proof-of-concept system was then used to evaluate the suitability of well-known graph partitioning algorithms for distributing a suite of CSP-based applications across a compute cluster, with the aim of reducing application execution time in each case. The experimental results are presented in summary form and briefly analyzed.
While CSP is traditionally taught as an algebra, with a focus on definitions and proofs, it may also be presented as a style of programming, that is process-oriented programming. For the last decade University of Copenhagen (UCPH) has been teaching CSP as a mix of the two, including both the formal aspects and process-oriented programming. This paper summarized the work that has been made to make process-oriented programming relevant to students, through programming assignments where process orientation is clearly simpler than an equivalent solution in imperative programming style.
The Go programming language defines simple I/O interfaces that any data type may implement. In this paper we introduce a Go package that allows arbitrary implementations of these interfaces to be composed into RAID-like redundant (and/or) high-performance striped arrays. The package also allows spares to be added for fail-over functionality. The package is focused on providing a highly available write setting that tolerates multiple failures but can always receive data as long as a single redundant path exists. This is achieved by allowing reads to become unavailable in the presence of failures that cannot be solved while the array is operating.
The package is highly concurrent and parallelized and exploits the Go programming language's built-in light-weight concurrency features.
Designing software controllers for multi-task automated service robotics is becoming increasingly complex. The combination of discrete-time (cyber) and continuous-time (physical) domains and multiple engineering fields makes it quite challenging to couple different subsystems as a whole for further verification and validation. Co-simulation is nowadays used to evaluate connected subsystems in the very early design phase and in an iterative development manner.
Leveraging on our previous efforts for a Model-Driven Development and simulation approach, that mainly focused on the software architecture, we propose a co-simulation approach adopting the Functional Mock-up Interface (FMI) standard to co-simulate the software controller with modelled physical plant dynamics. A model coupling approach is defined that involves the model transformation from a physical plant model implementing the FMI interface (denoted as a Functional Mock-up Unit, FMU) to a Communicating Sequential Processes (CSP) model. The Master Algorithm is (semi-)automatically generated from a co-simulation model that is formalised with CSP syntax to orchestrate the communication between different FMUs. Additionally, an optimized algorithm is defined to compensate for the artificial delay existing in a feedback loop. Finally, an example is used to illustrate the co-simulation approach, verify its working (at least, for this example) and to analyse the timing compensation algorithm.
WACOMSAS (WAsteful COMmunication SAtellite System) uses the Transterpreter to demonstrate a finite, static occam-based design for a ground station that communicates with at most four passing satellites at a time. There may be an indefinite number of satellites, each of which may never return. The satellites are not coded in occam, and external channels with mobile channel ends are used to communicate with them according to a switchboard analogy applicable both to hardware and software interfaces. The occam solution is compared to similar implementations in Go and Python/PyCSP, and environment that simulates satelites with Arduinos is set up and used for eksperiments.
The Synchronous Message Exchange (SME) model, is a programming model, which closely resembles the CSP model and which is suitable for describing hardware. This paper aims to combine the theory taught in a machine architecture class, with the SME model, by implementing a MIPS processor using SME. I show how to construct the components of a MIPS processor as SME processes, and how to connect them by using SME busses. Furthermore, I show how to extend the processor, by introducing additional instructions and by pipelining the processor.
What does it mean to be a message-passing concurrent language? This work attempts to build a framework for classifying such languages by judging four in regards to features and performance. Features of process calculi are used to evaluate Go, Rust, Erlang, and occam-π. Furthermore, standard communication time, selection time, and multicore utilisation are examined. Although each of these languages use message-passing concurrency, their approaches and characteristics are different. We can start to build an initial classification based on message-passing type, language support, and feature support. Such classification allows an initial discussion of the suitability of the evaluation framework, whether it is useful, and how it can be expanded. Approximately 50 further languages have been identified as potentially supporting message-passing concurrency to further build up the classification.
Existing approaches to concurrent programming, albeit essential, are easily used incorrectly. Testing is difficult due to the inherent non-determinism introduced by concurrency, especially in embedded systems (e.g., the Mars Rover catastrophe). DOMASCOS' goal is to produce (a) experts in concurrency programming and (b) libraries of concurrency program skeletons freeing applications from concurrency bugs – or, at least, significantly reducing their occurrence. Hence, programmers will no longer be required to implement their own concurrency mechanisms, but inherit guaranteed correct concurrency schemas directly from the DOMASCOS library. DOMASCOS addresses the problem of Reusable Concurrency for Modern System Design, supporting Europe to be at the forefront as the number of computationally enabled devices increases. DOMASCOS consists of concurrent groups of people sharing expertise in formally expressing correct concurrency, but who differ in application domains: namely high-performance computing, GPU programming, embedded systems and robotics. The variety in applications ensures versatility of the skeletons and the shared concurrency knowledge ensures coherence in the team.
The application topics of DOMASCOS range from high performance computing, embedded systems, the creative industries, and robotics. These are all cross-sectional areas, key for the development for future innovation. Areas on which DOMASCOS will have an impact directly will be finance, big data, Internet of Things, games, autonomous systems, and working in extreme environments.
The aims of this workshop are to generate discussion concerning the ambitions of the DOMASCOS project, obtain feedback to the authors and discover if there are any participants and/or organisations present who might consider being associated with the project in some way.
Anyone who knows process-oriented programming immediately acknowledges the claim that process orientation is well suited for games programming. As it turns out, this is easier acknowledged than done since there are no game programming frameworks that support an external threading library, but rather all insist on managing threads within the game framework. This fringe introduces an idea, as of yet without any implementation, to integrate visual effects directly within PyCSP.
Our IMS-T425 compatible Transputer design in FPGA has so far taken over 300 design days. Up to last year, minimal effort was spent for verification. Now a regression test bench has been brought in place, which is targeted to verify the design conformance after any changes. This T42 Transputer Verification Suite is based on a TVS-1 work from Michael Bruestle, and compares the register output of 54 selected instructions versus a true T425 golden reference for up to thousands of data samples. It has already helped in T42 micro-code debugging and hardware refinement.
Classic CSP is a “model without time” and yet contains time sequence and even a “time-out” or “sliding choice” operator (Roscoe, The Theory and Practice of Concurrency, 2005, p 80). The static occam language and the Transputer processor, both based on finite CSP, have time and other structure that make them a deliberate refinement of classic CSP; but occam is imperative and capable of doing completely general computing tasks. Programs written in occam and run on the Transputer can be proven correct and their behavior characterised down to cycle count. Classic CSP process descriptions of these same programs can also be investigated and proven using CSP techniques, including hiding.
This Fringe presentation, the first of a pair (see “Mathematical Ground Truth”), will introduce these two approaches, using calculated assembly-code response of two-priority T2, T4, or T8 Transputers. We will begin with the n-process rawFIFO and the two-process store/shelf FIFO detailed in author Dickson's book Crawl-Space Computing, p 111–120, using variable timing patterns for external communication. The true timing will be used to make sense of the abstract model of CSP in the real world, with and without hiding of internal channels.
The purpose of the investigation is to illuminate how rigorous conclusions can be drawn with a combination of CSP, occam-to-occam mappings, timing and sequence requirements. As time permits, we will expand the conclusions reached to IoT designs capable of serving multiple external communications and doing content calculations, treated as low-priority, time-consuming and nondeterministic.
Given the prior knowledge of implementing SME (Synchronous Message Exchange) onto hardware, describing small hardware becomes very simple. This is shown by constructing a Pong inspired game in SME, in approximately a week, which runs on an FPGA (Field-Programmable Gate Array). The game uses the VGA port as output and the buttons on the board as input, and as such runs purely on the FPGA. The game consists of two larger parts: the VGA controller and the game logic, with the VGA controller being the most problematic part to
Everyone is busy. No matter how much work you do, there always seems like there is more to do. Sometimes, we even seem to do nothing, spinning while we wait for someone else, or simply making no progress on our tasks. This presentation will show how we can take inspiration from computer science in general, and concurrency specifically, to help us organise our work. Concepts such as scheduling, context-switching, prioritisation, dependencies, logging, and time-slicing shall all be examined through the lens of personal effectiveness. The talk is a light-hearted one meant to inspire people into analysing their work practices, and is inspired by the book “Algorithms to Live By: The Computer Science of Human Decisions” by Brian Christian, but with a particular take from concurrency concepts.
Cluster computers are seen as expensive, space consuming, high energy platforms hidden away in universities and tech companies. Having your own cluster computer to explore parallelism ideas was not realistic for the majority of people. With the release of the Raspberry Pi things changed, and now it is possible to build a five node machine powered by USB that you can hold in your hand for as little as £120. This talk will describe how such a machine can be built with very little technical knowledge, and then provide an example of setting up MPI on the machine to allow simple distributed parallel applications to be explored. The system is not fast, and communicates via a shared USB 2.0 connection, but the principles of distributed parallelism can be explored. An example system will be available for demonstration purposes.
This Fringe presentation will continue the comparison of the two approaches from our first talk (“Formal Verification for IoT”) in rigorous mathematical detail. We introduce the mathematical concept of a Covering-with-Boundary (CwB), prove the equivalence of cycle-counted occam on a Transputer with this path diagram, and proceed to prove cases that avoid both divergence and effective divergence by finite restrictions. Due to static characteristics of occam-and-Transputer-based “Ground Truth,” we are also able to prove Hardware-Software Equivalence, and reach IoT conclusions that extend to non-digital payloads.