This fringe session will present the current status of our still ongoing IMS-T425 compatible Transputer design in FPGA. Data path and control path are in a stable working state. Fetch unit and a basic system control unit are almost functional. Small instruction sequences can be executed from 8 Kbyte memory already. Some design details around the scheduler micro-code will be discussed. Slides used in the presentation can be downloaded from .
OLL is the latest iteration of a series of compiler projects I've written over the past 20+ years. During that time I've targeted C, VHDL (hardware), ARM and Java's JVM. Each has concentrated on a different aspect of the challenge.
I've also designed, by hand, a lot of high-performance test equipment hardware/firmware and control/interface software; also a little bit of a flight system for a satellite. All have required care to (try to) ensure reliable operation, eliminating run-time failures. CSP has been a critical framework that has worked very well.
I am more convinced than ever that there is a need for a design language that describes algorithms equally well regardless of whether they are to be run as hardware or software: Hardware/Software Same-Design (not Hardware/Software Co-Design).
Support for eliminating run-time errors is essential. However, it is theoretically impossible to analyse arbitrary programs adequately and relatively easy to create pathological cases that would break the compiler. The interesting question is whether it is possible to handle a large-enough set of real (not contrived) programs to be useful. This project aims to explore the bounds of practicality. I will give a brief overview of the aims of this ongoing project and report its current status. Slides used in the presentation can be downloaded from .
Now that we can formally verify software models, why do we still need Protection from programming errors? For a similar reason, why do we still need protection from hardware errors? The key reason is that formal models are abstractions and programmers are humans with an illogical brain using illogical and error-prone dynamic programming languages. In addition, software runs on a shared resource, called a processor, and that processor exists in the real physical world, in which external influences like cosmic rays can change its state.
Hence, protection has to be seen in the context of increasing the trustworthiness (as defined by the Assured Reliability Resiliance Level criterion) of the system. The key is to do it in such a way that we do not jeopardise the properties we expect from a system in absence of the errors mentioned above. This was the rationale for developing VirtuosoNext, offering fine-grain and space partitioning with some help from the hardware. Slides used in the presentation can be downloaded from .
In some quarters, received wisdom about hard real-time systems (where lateness in response means system failure) is that a currently running process must be pre-empted by a higher priority process that becomes runnable (e.g. by the assertion of an external event pin or timeout) – otherwise worst-case response times cannot be guaranteed. Further, if a higher priority process needs to synchronise with one of lower priority, the latter must automatically inherit the priority of the former. Otherwise, the higher process effectively inherits the priority of the lower one as it waits for the latter to be scheduled (priority inversion) and, again, worst-case response times fail.
The CCSP multicore scheduler  for occam-π, part of the KRoC package , is possibly the fastest and most scalable (with respect to processor cores) such beast on the planet. However, its scheduling is cooperative (not pre-emptive) and it does not implement priority inheritance (and cannot do this given the nature of CSP synchronisation, where processes are unaware of the other processes involved). Therefore, despite its performance, received wisdom would seem to rule it out for hard real-time applications.
This talk reviews a paper, , from the OUG-7 proceedings (1987) that discusses these ideas with respect to Transputers. One minor improvement to the reported techniques will be described. Otherwise, no change is needed for modern multicore architectures. Conclusions are that (a) pre-emptive scheduling is not required to meet deadlines, (b) priority inversion is a design error (dealt with by correct design, not the run-time system) and (c) the occam-π/CCSP scheduler can be made to work even more efficiently for hard real-time systems than it presently does for soft real-time (such as complex system modelling). Slides used in the presentation can be downloaded from .
Numerical simulation of Earth's climate is one branch of computational fluid dynamics. This community faces two key challenges. Firstly, the turbulence closure problem has still not been solved. Therefore more accurate simulations require higher spatial resolution. Secondly, testing the fidelity of our climate models for a future warmer world can only be done by reproducing past warm periods. This requires much faster integrations, which also need to include Earth's full carbon biochemistry. The former requires massively parallel computing architectures and a data infrastructure that can manage rates of 1 Tb/day; the latter requires codes or chips that are an order of magnitude faster than what is currently available. I will briefly describe the evolution of the developments of climate models, show recent results with global geostrophic turbulence resolving models, and outline some ideas of how to structure the community's resources in the future. Slides used in the presentation can be downloaded from .
In this paper we present the JVMCSP - a runtime system for the JVM and a code generator in the ProcessJ compiler. ProcessJ is a new process-oriented language with a Java-like syntax and CSP semantics. ProcessJ compiles to a number of different runtimes and in this paper focuses on the JVM runtime. The approach followed in the implementation is inspired by previous prototype-work we have done, but in this paper we closely look at the actual implementation and how it differed from our previous assumptions. We also present a number of results that highlight the capabilities of our code generator and runtime. We show that the runtime has a low overhead and we managed to run a program on a single core with 480,900,001 processes and a total of over 1.4 billion runtime objects on the JVM heap.
Two current trends in modern robotics and other cyber-physical systems seem to conflict: the desire for better interaction with the environment of the robot increases the needed computational power to extract useful data from advanced sensors. This conflicts with the need for energy efficiency and mobility of the setups. A solution for this conflict is to use a distribution over two parallel systems: offloading a part of the complex and computationally expensive task to a base station, while timing-sensitive parts remain close to the robotic setup on an embedded processor. In this paper, a way to connect two of such systems is presented: a bridge is made between the Robotic Operating System (ROS), a widely used open source environment with many algorithms, and the CSP-execution engine LUNA. The bridge uses a (wireless) network connection, and provides a generic and reconfigurable way of connecting these two environments. The design, implementation in both environments, and tests characterizing the bridge are described in this paper.
Modern embedded systems are designed for multiple and increasingly demanding tasks. Complex concurrent software is required by multi-task automated service robotics for implementing their challenging (control) algorithms. TERRA is a Communicating Sequential Processes (CSP) algebra-based Eclipse graphical modelling tool suite which is capable of C++ code generation. It is designed to ease tedious and error-prone concurrent software development for robotics. However, sufficient simulation and visualization supports are not provided yet in TERRA. A hybrid simulation approach is proposed in this paper to provide simulation capabilities for the TERRA tool suite with respect to the Cyber-Physical Systems (CPS) co-design. Moreover, a visualization for the simulation is designed as well to provide animation facilities which enable users to visually trace simulated execution flows. Finally, we use an example to test the hybrid simulation approach as well as visualization facilities. The simulation approach is shown to be sufficient and the visualization works as intended.
In this paper we describe the design and implementation of the Tapr high performance tape streaming system. Tapr consists of a number of basic processes interacting though message passing on CSP-style communications channels. The system is highly concurrent, uses synchronous as well as asynchronous coordination without the need for complex usage of traditional locks. The system scales to and beyond contemporary enterprise so-called automated tape libraries by representing each and every part of the tape library as a communicating process. This includes the robot changer, each tape drive, all clients and even the loaded tape media.
We show how such an implementation can be done in the Go programming language with relative ease by utilizing the concurrency primitives included in the base language. We also describe how complex cancellation and timeout handling can be handled directly in the language by using the concept of a surrounding context.
Finally, we present a number of benchmarks designed to show that the communicating process architecture does not impose any measurable overhead, but rather allows the system to scale to a high number of clients and devices using a simple and intuitive process-based design.
This paper presents updates and measurements for the Concurrent Communications Library, CoCoL, which is a CSP inspired library targeting C# and other languages running on the Common Language Runtime, also known as .Net. We describe the new library interface methods that simplify writing correct, encapsulated and compositional networks. We also describe an extension to the library, which enables communication over network connections and measure the performance.
For a number of years, the Communicating Process Architecture (CPA) community have developed languages and runtimes supporting message passing con-currency. For these we always provide a set of reusable processes called plug and play. These components provide a rich set of functions to the new CPA programmer, enabling them to develop applications. In this paper, we describe recent work in taking the plug and play ideology and applying it to the area of algorithmic skeletons. We have based our work on the RISC-pb2l specifications of Danelutto et. al. to provide a base set of skeletal components, focusing on the communication behaviours they exhibit.
The Climate-ecological Observatory for Arctic Tundra (COAT) is a long-term research initiative for real time detection, documentation and understanding of climate impacts on terrestrial arctic ecosystems. COAT is a collaboration of several Norwegian research institutions under the umbrella of FRAM - High North Centre for Climate and Environment. The study areas include the bioclimatic extremes of the terrestrial Arctic, low arctic coast of Norway and high arctic Svalbard. An important part of the observatory is sensors placed in the environment to observe wildlife and plants. Current sensor packages are fairly robust and work well for small to medium scale deployment. For larger scales, however, there is a clear demand for better management and control. This paper summarises some current experiences with deploying cameras and some of the challenges that we intend to address in an up-coming project where we aim to increase the capability of scientists to handle a larger number and diversity of sensor types and variation in deployment while minimising human traffic and impact in the monitored environments. To build this type of observatory at increasing scales, we expect to use robust programming architectures, open modular sensor packages, on-line processing, monitoring and configuration management and a range of communication technologies to cope with variations in connectivity.
While CSP-only models process-to-process rendezvous-style message passing, all newer CSP-type programming libraries offer more powerful mechanisms, such as buffered channels, and multiple receivers, and even multiple senders, on a single channel. This work investigates the possible variations of a one-to-all, broadcasting, channel. We discuss the different semantic meanings of broadcasting and show three different possible solutions for adding broadcasting to CSP-style programming.
Current robotic systems are becoming more and more complex. This is due to an increase in the number of subsystems that have to be controlled from a central processing unit as well as more stringent requirements on stability, reliability and timing. A possible solution is to offload computationally demanding parts to an FPGA connected to the main processor. The parallel nature of FPGAs makes achieving hard real-time guarantees more easy. Additionally, due its parallel and sequential constructs, CSP matches structurally with an FPGA. In this paper, a CSP to hardware mapping is proposed where key CSP structures are translated to hardware using the functional language CλaSH. The CSP structures can be designed using the TERRA tool chain while CλaSH code is generated for implementing hardware. The functionality of the CSP mapping is illustrated using some producer-consumer examples. In this paper, the design, implementation and tests are presented. Future work is to implement the ALT construct, generate token diagrams for user understanding.
The Synchronous Message Exchange, SME, is a programming model that both resembles communication in hardware, and can be implemented as a CSP network. This paper extends on previous work for modeling hardware-like programs using SME in Python, with the addition of a source-to-source compiler that converts an SME network implemented in Python to an equivalent implementation in VHDL. We describe the challenges, constraints, and solutions involved in translating a highly dynamic language like Python into the hardware-like VHDL language. We also show how the approach can assist in further VHDL refinement by generating tedious test bench code, such that VHDL designs can be simulated and verified with vendor supplied simulation and synthesis tools.
Reading and writing is modelled in CSP using actions containing the symbols ? and !. These reading and writing actions are synchronous and there is a one-to-one relationship between occurrences of pairs of these actions. It is cumbersome to ease the restriction of synchronous execution of the read and write actions. For this reason we introduce the half-asynchronous parallel operator that acts on actions containing the symbols ?' and !' and study the impact on a Vertex Removing Synchronised Product.
Although many CSP inspired libraries exist, none yet have targeted modern C++ (C++11 onwards). The work presented has a main objective of providing a new C++CSP library which adheres to modern C++ design principles and standards. A secondary objective is to develop a library that provides simple message passing concurrency in C++ using only the standard library. The library is evaluated in comparison to JCSP using microbenchmarks. CommsTime and StressedAlt are used to determine the properties of coordination time, selection time, and maximum process count. Further macrobenchmarks, Monte Carlo π and Mandelbrot, are gathered to measure potential speedup with C++CSP. From the microbenchmarks, it is shown that C++CSP performs better than JCSP in communication and selection operations, and due to using the same threading model as JCSP can create an equal number of processes. From the macrobenchmarks, it is shown that C++CSP can provide an almost six times speedup for computation based workloads, and a four times speedup for memory based work-loads. The implementation of move semantics in channels have provided suitable enhancements to overcome data copy costs in channels. Therefore, C++CSP is considered a useful addition to the range of CSP libraries available. Future work will investigate other benchmarks within C++CSP as well as development of networking and skeleton based frameworks.
A major research challenge in the architectural design of a software-intensive System-of-Systems (SoS) is to enable the formal modeling of its evolutionary architecture. One of the main issues is that SoS architectures evolve dynamically, during run-time, in unexpected ways while producing emergent behavior. To address this issue, this paper proposes a novel process calculus, called “the π-Calculus for SoS”, defined as a novel variant of the π-Calculus based on concurrent constraints and inferred channel bindings for enabling the formal modeling of software-intensive SoSs, meeting their challenging architectural characteristics.
Many (embedded) systems are often designed with timeouts at places where they are not needed or even wrong. When there is a timeout, it may break the very idea of how a contract should be: without timeout. For example, some response from an internal communication driver (that handles an external connection) may be awaited for with a timeout – when it might be better just to wait for a proper response from the driver telling that the connection is indeed broken (i.e. that the driver performs the timeout). Timeout has a dimension of layer associated with it. For the given example, the timeout was properly handled by the driver to detect the broken connection, not by the client.
Timeouts areof courseappropriate for periodic processes like blinking an LED or pinging a linewhere the connotation of a timer is used. Howeverhaving timeouts between internal communicating processes quickly makes matters difficult. We define timeout not to be part of a contract per se (even if this is rather contrary to judicial understanding of the term, where an expiration date often is necessary). In design by contract, failed critical assertions may be handled locally and the whole system may restart – best detected during testing, before final release. Of course, formally verifying a system or using deadlock-free patterns to ensure a correct design would be better. Requiring (over an external link that has a timeout of 5 seconds) that a heating element must increase the temperature after the push of a button (within 4 seconds) and a response must be shown on the display (by 3 seconds) should trigger a (hopefully) interesting and useful discussion of the specifications.
We shall also consider the process network shown in Figure 1, where timeouts are indicated by three-sided arrows and labelled “t1”, “t2” etc. The timer “t2” may not be needed and the system much simplified without it.
This abstract is based on the blog note "Timing out design by contract with a stopwatch" . Slides used in the presentation can be downloaded from .
This fringe session will present the design progress of our IMS-T425 compatible Transputer design in FPGA. The 32-bit CPU + Memory interface (2x8kB) are in stable working condition. 117 instructions (from 123+7) are almost implemented in 460 lines of uCode: for example, TASM loops including interruptible MOVE(s) can be simulated some 100 clock cycles. Timer(s) are running. The System control unit allows error mode, MOV-bit and events. Some still open questions around scheduler micro-code and link interaction will be discussed. Slides used in the presentation can be downloaded from .
Concurrency is commonly used for coordinating events that are non-deterministic in nature, sensors, transaction systems and games to mention a few. However, in the era of Big Data, concurrency has a whole new dimention to the need for concurrency. With thousands of concurrent operations with highly asynchronous completion times, an efficient concurrency control system is essential for performance. This talk will show the challenge and introduce one idea towards concurrency as a mechanism for Big Data access. Slides used in the presentation can be downloaded from .
Graphs, animations, and visualisations are known to be valuable forms of presenting results of experiments, both simulation and real-life experiments. For CSP-based concurrent programs, the state of processes, CSP constructs and channels are relevant to show. For proper feedback, these can best be related to the form in which the program was entered by the user. In the case of the TERRA graphical CSP tool, feedback is given by colouring the diagram elements according to the specific state they are in, see Figure 1. Next to that, a textual log of events is produced, giving more details relevant in the development process. In this Fringe session, we demonstrate this visualisation facility of our graphical CSP tool TERRA. A paper  reporting context and technical details of this tool appears elsewhere in these Proceedings. Slides used in the presentation can be downloaded from .
The small, low-cost and tinkering-friendly Raspberry Pi computer board has been used as the basis for a variety of distributed computing clusters built by research groups and individuals for experimental and pedagogical use. The new Raspberry Pi Zero model is smaller, consumes less power, and costs only $5.00 (when supplies are available), but lacks the built-in ethernet interface of its larger predecessors. Making a virtue of necessity, an Altera Cyclone II FPGA on an inexpensive development board can be used to provide the communication fabric for a pocket-sized Raspberry Pi Zero cluster, avoiding the need for bulky network cables and routers, and enabling experimentation with different networking architectures which may be more suited to fine-grained closely-coupled distributed computations than the usual TCP/IP over commodity ethernet. (Work in progress.)