Ebook: Communicating Process Architectures 2011
This book is a collection of the papers presented at the 33rd Communicating Process Architecture (CPA) conference, held at the University of Limerick, Ireland, 19-22 June, 2011. It was hosted by Lero, the Irish Software Engineering Research Centre, and co-located with FM 2011 (the 17th International Symposium on Formal Methods), SEW-34 the 34th Annual IEEE Software Engineering Workshop) and several specialist workshops and tutorials. These CPA proceedings contain the results from rich seams of research covering many of the key issues in modern computer science, which all seem to concern concurrency in one form or another these days. Inside, you will find papers on concurrency models and their theory, concurrency pragmatics (the effective use of multicores), language ideas and implementation (for mobile processes, generalised forms of choice), tools to assist verification and performance, applications (large scale simulation, robotics, web servers), benchmarks (for scientific and distributed computing) and, perhaps most importantly, education. They reflect the increasing relevance of concurrency both to express and manage complex problems as well as to exploit readily available parallel hardware.
This thirty-third Communicating Process Architectures conference, CPA 2011, takes place at the University of Limerick, 19-22 June, 2011. It is hosted by Lero, the Irish Software Engineering Research Centre, and (as for CPA 2009) co-located with FM 2011 (the 17th International Symposium on Formal Methods). Also co-located this year are SEW-34 (the 34th Annual IEEE Software Engineering Workshop) and several specialist Workshops.
We are very pleased this year to have Gavin Lowe, Professor of Computer Science at the University of Oxford Computing Laboratory for our keynote speaker. His research over the past two decades has made significant contributions to the field of concurrency, with special emphasis on CSP and the formal modelling of computer security. His paper addresses a long-standing and crucial issue for this community: the verified implementation of CSP external choice, with no restrictions.
We have also received a good set of papers covering many of the key issues in modern computer science, which all seem to concern concurrency in one form or another these days. Inside, you will find papers on concurrency models and their theory, pragmatics (the effective use of multicores), language ideas and implementation (for mobile processes, generalised forms of choice), tools to assist verification and performance, applications (large scale simulation, robotics, web servers), benchmarks (for scientific and distributed computing) and, perhaps most importantly, education. They reflect the increasing relevance of concurrency both to express and manage complex problems as well as to exploit readily available parallel hardware.
Authors from all around the world, old hands and new faces, PhD students and professors will be gathered here this week. We hope everyone will have a good time and engage in many stimulating discussions and much learning – both in the formal sessions of the conference and in the many opportunities afforded by the evening receptions and dinners, which are happening every night, and into the early hours beyond.
We thank the authors for their submissions and the Programme Committee for their hard work in reviewing the papers. We also thank Mike Hinchey at Lero for inviting CPA 2011 to be part of the week of events surrounding FM 2011 and for being so helpful during the long months of planning. Finally, we thank Patsy Finn and Susan Mitchell, also at Lero, for all the detailed – and extra – work they put in researching and making all the special arrangements we requested for CPA.
Peter Welch (University of Kent), Adam Sampson (University of Abertay Dundee), Frederick Barnes (University of Kent), Jan B. Pedersen (University of Nevada, Las Vegas), Jan Broenink (University of Twente), Jon Kerridge (Edinburgh Napier University).
In this paper we describe the design and implementation of a generalised alt operator for the Communicating Scala Objects library. The alt operator provides a choice between communications on different channels. Our generalisation removes previous restrictions on the use of alts that prevented both ends of a channel from being used in an alt. The cost of the generalisation is a much more difficult implementation, but one that still gives very acceptable performance. In order to support the design, and greatly increase our confidence in its correctness, we build CSP models corresponding to our design, and use the FDR model checker to analyse them.
This paper presents the central elements of a new dynamic channel leading towards a flexible CSP design suited for high-level languages. This channel is separated into three models: a shared-memory channel, a distributed channel and a dynamic synchronisation layer. The models are described such that they may function as a basis for implementing a CSP library, though many of the common features known in available CSP libraries have been excluded from the models. The SPIN model checker has been used to check for the presence of deadlocks, livelocks, starvation, race conditions and correct channel communication behaviour. The three models are separately verified for a variety of different process configurations. This verification is performed automatically by doing an exhaustive verification of all possible transitions using SPIN. The joint result of the models is a single dynamic channel type which supports both local and distributed any-to-any communication. This model has not been verified and the large state-space may make it unsuited for exhaustive verification using a model checker. An implementation of the dynamic channel will be able to change the internal synchronisation mechanisms on-the-fly, depending on the number of channel-ends connected or their location.
The current trend in processor design seems to focus on using multiple cores, similar to a cluster-on-a-chip model. These processors are generally fast and power efficient, but due to their highly parallel nature, they are notoriously difficult to program for most scientists. One such processor is the CELL broadband engine (CELL-BE) which is known for its high performance, but also for a complex programming model which makes it difficult to exploit the architecture to its full potential. To address this difficulty, this paper proposes to change the programming model to use the principles of CSP design, thus making it simpler to program the CELL-BE and avoid livelocks, deadlocks and race conditions. The CSP model described here comprises a thread library for the synergistic processing elements (SPEs) and a simple channel based communication interface. To examine the scalability of the implementation, experiments are performed with both scientific computational cores and synthetic workloads. The implemented CSP model has a simple API and is shown to scale well for problems with significant computational requirements.
In this paper we consider a refinement of the concept of mobile processes in a process oriented language. More specifically, we investigate the possibility of allowing resumption of suspended mobile processes with different interfaces. This is a refinement of the approach taken currently in languages like occam-π. The goal of this research is to implement varying resumption interfaces in ProcessJ, a process oriented language being developed at UNLV.
Previous algorithms for resolving choice over multiway synchronisations have been incompatible with the notion of priority. This paper discusses some of the problems resulting from this limitation and offers a subtle expansion of the definition of priority to make choice meaningful when multiway events are involved. Presented in this paper is a prototype extension to the JCSP library that enables prioritised choice over multiway synchronisations and which is compatible with existing JCSP Guards. Also discussed are some of the practical applications for this algorithm as well as its comparative performance.
Data parallel programming provides an accessible model for exploiting the power of parallel computing elements without resorting to the explicit use of low level programming techniques based on locks, threads and monitors. The emergence of Graphics Processing Units (GPUs) with hundreds or thousands of processing cores has made data parallel computing available to a wider class of programmers. GPUs can be used not only for accelerating the processing of computer graphics but also for general purpose data-parallel programming. Low level data-parallel programming languages based on the Compute Unified Device Architecture (CUDA) provide an approach for developing programs for GPUs but these languages require explicit creation and coordination of threads and careful data layout and movement. This has created a demand for higher level programming languages and libraries which raise the abstraction level of data-parallel programming and increase programmer productivity. The Accelerator system was developed by Microsoft for writing data parallel code in a high level manner which can execute on GPUs, multicore processors using SSE3 vector instructions and FPGA chips. This paper compares the performance and development effort of the high level Accelerator system against lower level systems which are more difficult to use but may yield better results. Specifically, we compare against the NVIDIA CUDA compiler and sequential C++ code considering both the level of abstraction in the implementation code and the execution models. We compare the performance of these systems using several case studies. For some classes of problems, Accelerator has a performance comparable to CUDA, but for others its performance is significantly reduced; however in all cases it provides a model which is easier to use and enables greater programmer productivity.
It is currently very difficult to purchase any form of computer system be it, notebook, laptop, desktop server or high performance computing system that does not contain a multicore processor. Yet the designers of applications, in general, have very little experience and knowledge of how to exploit this capability. Recently, the Scottish Informatics and Computer Science Alliance (SICSA) issued a challenge to investigate the ability of developers to parallelise a simple Concordance algorithm. Ongoing work had also shown that the use of multicore processors for applications that have internal parallelism is not as straightforward as might be imagined. Two applications are considered: calculating π using Monte Carlo methods and the SICSA Concordance application. The ease with which parallelism can be extracted from a single application using both single multicore processors and distributed networks of such multicore processors is investigated. It is shown that naïve application of parallel programming techniques does not produce the desired results and that considerable care has to be taken if multicore systems are to result in improved performance. Meanwhile the use of distributed systems tends to produce more predictable and reasonable benefits resulting from parallelisation of applications.
Since the invention of the light bulb, artificial light is accompanying people all around the world every day and night. As the light bulb itself evolved a lot through years, light control systems are still switch-based and require users to make most of the decisions about its behaviour. This paper presents an algorithm for emergent behaviour in a lighting system to achieve stable, user defined light level, while saving energy. The algorithm employs a parallel design and was tested using JCSP.
Modern embedded systems have multiple cores available. The CTC++ library is not able to make use of these cores, so a new framework is required to control the robotic setups in our lab. This paper first looks into the available frameworks and compares them to the requirements for controlling the setups. It concludes that none of the available frameworks meet the requirements, so a new framework is developed, called LUNA. The LUNA architecture is component based, resulting in a modular structure. The core components take care of the platform related issues. For each supported platform, these components have a different implementation, effectively providing a platform abstraction layer. High-level components take care of platform-independent tasks, using the core components. Execution engine components implement the algorithms taking care of the execution flow, like a CSP implementation. The paper describes some interesting architectural challenges encountered during the LUNA development and their solutions. It concludes with a comparison between LUNA, C++CSP2 and CTC++. LUNA is shown to be more efficient than CTC++ and C++CSP2 with respect to switching between threads. Also, running a benchmark using CSP constructs, shows that LUNA is more efficient compared to the other two. Furthermore, LUNA is also capable of controlling actual robotic setups with good timing properties.
The success of the Arduino platform has made embedded programming widely accessible. The Arduino has seen many uses, for example in rapid prototyping, hobby projects, and in art installations. Arduino users are often not experienced embedded programmers however, and writing correct software for embedded devices can be challenging. This is especially true if the software needs to use interrupts in order to interface with attached devices. Insight and careful discipline are required to avoid introducing race hazards when using interrupt routines. Instead of programming the Arduino in C or C++ as is the custom, we propose using occam-π as a language as that can help the user manage the concurrency introduced when using interrupts and help in the creation of modular, well-designed programs. This paper will introduce the Arduino, the software that enables us to run occam-π on it, and a case study of an environmental sensor used in an Environmental Science course.
The provision of mechanisms for processor allocation in current distributed parallel programming models is very limited. This makes difficult, or even prohibits, the expression of a large class of programs which require a run-time assessment of their required resources. This includes programs whose structure is irregular, composite or unbounded. Efficient allocation of processors requires a process creation mechanism able to initiate and terminate remote computations quickly. This paper presents the design, demonstration and analysis of an explicit mechanism to do this, implemented on the XMOS XS1 architecture, as a foundation for a more dynamic scheme. It shows that process creation can be made efficient so that it incurs only a fractional overhead of the total runtime and that it can be combined naturally with recursion to enable rapid distribution of computations over a system.
This paper introduces webpipes, a compositional web server toolkit written using the Go programming language as part of an investigation of concurrent software architectures. This toolkit utilizes an architecture where multiple functional components respond to requests, rather than the traditional monolithic web server model. We provide a classification of web server components and a set of type definitions based on these insights that make it easier for programmers to create new purpose-built components for their systems. The abstractions provided by our toolkit allow servers to be deployed using several concurrency strategies. We examine the overhead of such a framework, and discuss possible enhancements that may help to reduce this overhead.
Performance of communication mechanisms is very important in distributed systems frameworks, especially when the aim is to provide a particular type of behaviour across a network. In this paper, performance measurements of the distributed Communicating Process Architectures networking protocol and stack is presented. The results presented show that for general communication, the distributed CPA architecture is close to the baseline network performance, although when dealing with parallel speedup for the Mandelbrot set, little performance is gained. A discussion into the future direction of the distributed CPA architecture and protocol in relation to occam-π and other runtimes is also presented.
The CoSMoS project is building generic modelling tools and simulation techniques for complex systems. As part of this project a number of simulations have been developed in many programming languages. This paper describes a framework for interconnecting simulation components written in different programming languages. These simulation components are synchronised and coupled using a shared object space. This approach allows us to combine highly concurrent agent-based simulations written in occam-π, with visualisation and analysis components written in flexible scripting languages such as Python and domain specific languages such as MATLAB.
This paper describes a model for concurrent computation based on single-writer single-assignment variables. The description is primarily graphical, resembling the interaction nets formalism. The model embodies rules in a process which may require two or more communications from other processes to respond. However, these are managed by a partial evaluation response on receiving a single communication.
In traditional real-time multiprocessor schedulability analysis it is required that all tasks are entirely serial. This implies that if a task is written in a parallel language such as occam, all parallelism in the task must be suppressed to enable schedulability analysis. Part of the reason for this restriction is the difficulty in analysing execution times of programs with a complex parallel structure. In this paper we introduce an abstract model for reasoning about the temporal properties of such programs. Within this model, we define what it means for a process to be easier to schedule than another, and the notion of upper bounds on execution times. Counterintuitive temporal behaviour is demonstrated to be inherent in all systems where processes are allowed an arbitrary parallel structure. For example, there exist processes that are guaranteed to complete on some schedule, that may not complete if executing less than the expected amount of computation. Not all processes exhibit such counterintuitive behaviour, and we identify a subset of processes that are well-behaved in this respect. The results from this paper is a necessary prerequisite for a complete schedulability analysis of systems with an arbitrary parallel structure.
This paper describes how to model channel-based digital asynchronous circuits using SystemVerilog interfaces that implement CSP-like communication events. The interfaces enable explicit handshaking of channel wires as well as abstract CSP events. This enables abstract connections between modules that are described at different levels of abstraction facilitating both verification and design. We explain how to model one-to-one, one-to-many, one-to-any, any-to-one and synchronised channels. Moreover, we describe how to split communication actions into multiple parts to model more accurately less concurrent handshaking protocols that are commonly found in many asynchronous pipelines.
Previous work has demonstrated the feasibility of using process-oriented programming to implement simple subsumption architectures for robot control. However, the utility and scalability of process-based subsumption architectures for more complex tasks and those involving multiple robots has not been proven. We report our experience of applying these techniques to the implementation of a standard foraging problem in swarm robotics, using occam-π to implement a subsumption control system. Through building a system with a realistic level of complexity, we have discovered both advantages and disadvantages to the process-oriented subsumption approach for larger robot control systems.
This paper introduces a case study exploring some of the legacy issues that may be faced when redeveloping a system. The case study is a robotics system programmed in occam and Handel-C, allowing us to draw comparisons between software and hardware implementations in terms of program architecture, ease of program code verification, and differences in the behaviour of the robot. The two languages used have been selected because of their model of concurrency and their relation to CSP. The case study contributes evidence that re-implementing a system from an abstract model may present implementation specific issues despite maintaining the same underlying program control structure. The paper identifies these problems and suggests a number of steps that could be taken to help mitigate some of the issues.
The Flying Gator is an unmanned aerial vehicle developed to support investigations regarding concurrent and parallel control for robotic and embedded systems. During ten weeks in the summer of 2010, we designed, built, and tested an airframe, control electronics, and a concurrent firmware capable of sustaining autonomous level flight. Ultimately, we hope to have a robust, open source control system capable of supporting interesting research questions exploring concurrency in real time systems as well as current issues in sustainable agriculture.
This paper presents an analysis-methodof concurrentprocesses with value-passing which may cause infinite-state systems. The method consists of two steps: sequentialisation and state-reduction. In the sequentialisation, the symbolic transition graph of a given concurrent process is derived by symbolic operational semantics. In the state-reduction,the numberof states in the symbolic transition graph is reduced by removingneedless internal transitions. Furthermore,this paper introducesan analysis-tool called CONPASU, which implements the analysis-method, and demonstrates how CONPASU can be used for automatically analyzing concurrent processes. For example, it can extract abstract behaviors, which are useful for understanding complex behaviors, by focusing on some interesting events.
We report the development of a verification tool for Timed CSP processes. The tool has been built on the functional programming language ML. The tool interprets processes described in both timed and untimed CSP, converting them to ML functions, and executing those functions for the verification of refinement in the timed traces and timewise traces models. Using the programmability of higher order functionality, the description of CSP processes with ML has been synthesised naturally. The effectiveness of the tool is demonstrated with an example analysing implementations of Fischer's algorithm for the exclusive control of a shared resource in a multi-processor environment.
The current model of mobile processes in occam-π implements a single interface for host processes to use. However, different hosts holding different kinds of resource will naturally require different interfaces to interact with their visitors. So, current occam-π mobiles have to offer a single union of all the interfaces needed and hosts must provide dummy arguments for those irrelevant to its particular calls. This opens the possibilty of programming errors in both hosts and mobile should those dummies mistakenly be used. This paper considers a revised model for mobile processes that allows many interfaces. It also proposes the concept of variant call channels, that expands on a mechanism proposed for the occam3 language, and shows a simple duality between the revised mobile processes and mobile variant call channels. An implementation of mobile variant call channels, via source-code transformation to standard occam-π mobile channel bundles, is presented. This gives a demonstration implementation for the revised mobile process model and an operational semantics. The paper is illustrated with a case study based on the Santa Claus problem, where the elves and reindeer are mobile processes.
This is a proposal for the formal verification of occam-π programs to be managed entirely within occam-π. The language is extended with qualifiers on types and processes (to indicate relevance for verification and/or execution) and assertions about refinement (including deadlock, livelock and determinism). The compiler abstracts a set of CSPm equations and assertions, delegates their analysis to the FDR2 model checker and reports back in terms related to the occam-π source. The rules for mapping the extended occam-π to CSPm are given. The full range of CSPm assertions is accessible, with no knowledge of CSP formalism required by the occam-π programmer. Programs are proved just by writing and compiling programs. A case-study analysing a new (and elegant) solution to the Dining Philosophers problem is presented. Deadlock-freedom for colleges with any number of philosphers is established by verifying an induction argument (the base and induction steps). Finally, following guidelines laid down by Roscoe, the careful use of model compression is demonstrated to verify directly the deadlock-freedom of an occam-π college with 102000 philosphers (in around 30 seconds). All we need is a universe large enough to contain the computer on which to run it.