Ebook: Communicating Process Architectures 2005
The awareness of the ideas characterized by Communicating Processes Architecture and their adoption by industry beyond their traditional base in safety-critical systems and security is growing. The complexity of modern computing systems has become so great that no one person – maybe not even a small team – can understand all aspects and all interactions. The only hope of making such systems work is to ensure that all components are correct by design and that the components can be combined to achieve scalability. A crucial property is that the cost of making a change to a system depends linearly on the size of that change – not on the size of the system being changed. Of course, this must be true whether that change is a matter of maintenance (e.g. to take advantage of upcoming multiprocessor hardware) or the addition of new functionality. One key is that system composition (and disassembly) introduces no surprises. A component must behave consistently, no matter the context in which it is used – which means that component interfaces must be explicit, published and free from hidden side-effect. This publication offers strongly refereed high-quality papers covering many differing aspects: system design and implementation (for both hardware and software), tools (concurrent programming languages, libraries and run-time kernels), formal methods and applications.
We are at the start of a new CPA conference. Communicating Process Architectures 2005 marks the first time that this conference has been organized by an industrial company (Philips) in co-operation with a university (Technische Universiteit Eindhoven). We see that this also marks the growing awareness of the ideas characterized by 'Communicating Processes Architecture' and their growing adoption by industry beyond their traditional base in safety-critical systems and security.
The complexity of modern computing systems has become so great that no one person – maybe not even a small team – can understand all aspects and all interactions. The only hope of making such systems work is to ensure that all components are correct by design and that the components can be combined to achieve scalability. A crucial property is that the cost of making a change to a system depends linearly on the size of that change – not on the size of the system being changed. Of course, this must be true whether that change is a matter of maintenance (e.g. to take advantage of upcoming multiprocessor hardware) or the addition of new functionality. One key is that system composition (and disassembly) introduces no surprises. A component must behave consistently, no matter the context in which it is used – which means that component interfaces must be explicit, published and free from hidden side-effect. Our view is that concurrency, underpinned by the formal process algebras of Hoare's Communicating Sequential Processes and Milner's π-Calculus, provides the strongest basis for the development of technology that can make this happen.
Once again we offer strongly refereed high-quality papers covering many differing aspects: system design and implementation (for both hardware and software), tools (concurrent programming languages, libraries and run-time kernels), formal methods and applications. These papers are presented in a single stream so you won't have to miss out on anything. As always we have plenty of space for informal contact and we don't have to worry about the bar closing at half ten!
We are pleased to have keynote speakers such as Ad Peeters of Handshake Solutions and Guy Broadfoot of Verum, proving that you can actually make profitable business using CSP as your guiding principle in the design of concurrent systems, be they hardware or software. The third keynote by IBM Chief Architect Peter Hofstee assures us that CSP was also used in the design of the communication system of the recent Cell processor, jointly developed by IBM, Sony and Toshiba. The fourth keynote talk is by Paul Stravers of Philips Semiconductors on the Wasabi multiprocessor architecture.
We anticipate that you will have a very fruitful get-together and hope that it will provide you with as much inspiration and motivation as we have always experienced.
We thank the authors for their submissions, the Programme Committee for their hard work in reviewing the papers and Harold Weffers and Maggy de Wert (of TUE) in making the arrangements for this meeting. Finally, we are especially grateful to Fred Barnes (of the University of Kent) for his essential technical expertise and time in the preparation of these proceedings.
Herman Roebbers (Philips TASS), Peter Welch and David Wood (University of Kent), Johan Sunter (Philips Semiconductors), Jan Broenink (University of Twente)
Honeysuckle [1] is a new programming language that allows systems to be constructed from processes which communicate under service (client-server or master-servant) protocol [2]. The model for abstraction includes a formal definition of both service and service-network (system or component) [3]. Any interface between two components thus forms a binding contract which will be statically verified by the compiler. An account is given of how such an interface is constructed and expressed in Honeysuckle, including how it may encapsulate state, and how access may be shared and distributed. Implementation is also briefly discussed.
For some years there has been much activity in developing CSP-like extensions to a number of common programming languages. In particular, a number of groups have looked at extensions to Java. Recent developments in the Java platform have resulted in groups proposing more expressive problem solving environments. Groovy is one of these developments. Four constructs are proposed that support the writing of parallel systems using the JCSP package. The use of these constructs is then demonstrated in a number of examples, both concurrent and parallel. A mechanism for writing XML descriptions of concurrent systems is described and it is shown how this is integrated into the Groovy environment. Finally conclusions are drawn relating to the use of the constructs, particularly in a teaching and learning environment.
This paper discusses issues, possibilities and existing approaches for fitting an exception handling mechanism (EHM) in CSP-based process-oriented software architectures. After giving a survey on properties desired for a concurrent EHM, specific problems and a few principal ideas for including exception handling facilities in CSP-designs are discussed. As one of the CSP-based frameworks for concurrent software, we extend CT (Communicating Threads) library with the exception handling facilities. The extensions result in two different EHM models whose compliance with the most important demands of concurrent EHMs (handling simultaneous exceptions, the mechanism formalization and efficient implementation) are observed.
In this paper, we demonstrate a structured approach to proceed from development in a high level-modeling environment to testing on the real hardware. The concept is introduced by taking an example scenario that involves automatic code generation of Handel-C for FPGAs. The entire process is substantiated with a prototype that generates Handel-C code from MATLAB®/Simulink® for most common Simulink® blocks. Furthermore, we establish the potential of the notion by generating Handel-C for an FPGA, which controls the flow of paper through the scanning section of a printer/copier. Additionally, we present another method to generate Handel-C from a state-based specification. Finally, to verify and validate the behavior of the generated code, we execute several levels of simulation, including software-in-the-loop and hardware-in-the-loop simulations.
This paper presents a novel technique for safe partial or complete process network termination. The idea is to have two types of termination messages / poison: LocalPoison and GlobalPoison. Injecting GlobalPoison into a process network results in a safe termination of the whole process network. In contrast, injected LocalPoison only terminates all processes until it is filtered out by Poison-Filtering Channels. This allows the creation of termination domains inside a process network. To make handling of a termination message easy, it is delivered as an exception and not as a normal message. The necessary Poisonable- and Poison-Filtering-Channels have been modelled in CSP and checked using FDR. A proof of concept implementation for Communicating Sequential Processes for Java (JCSP) has been developed and refined. Previously, JCSP offered no safe way to terminate the process network. When the user terminated the program, the Java Virtual Machine (JVM) simply stops all threads (processes), without giving the processes the chance to perform clean up operations. A similar technique is used to perform partial termination of process networks in JCSP, making it unsafe as well. The technique presented in this paper is not limited to JCSP, but can easily be ported to other CSP environments. Partial process network termination can be applied in the area of Software Defined Radio (SDR), because SDR systems need to be able to change their signal processing algorithms during runtime.
The JCSPNet package from Quickstone provides the capability of transparently creating a network of processes that run across a TCP/IP network. The package also contains mechanisms for creating mobile processes and channels through the use of filters and the class Dynamic Class Loader, though their precise use is not well documented. The package jcsp.mobile rectifies this position and provides a set of classes and interfaces that facilitates the implementation of systems that use mobile processes and channels. In addition, the ability to migrate processes and channels from one processor to another is also implemented. The capability is demonstrated using a multi-user game running on an ad-hoc wireless network using a workstation and four PDAs.
CSP++ is a tool that makes specifications written in CSPm executable and extensible. It is the basis for a technique called selective formalism, which allows part of a system to be designed in verifiable CSPm statements, automatically translated into C++, and linked with functions coded in C++. This paper describes in detail the subset of CSPm that can be accurately translated by CSP++, and how the CSP semantics are achieved by the runtime framework. It also explains restrictions that apply to coding in CSPm for software synthesis, and the rationale for those restrictions.
In this paper a data sharing framework for multi-threaded, distributed control programs is described that is realized in C++ by means of only a few, powerful classes and templates. Fast data exchange of entire data structures is supported using sockets as communication medium. Access methods are provided that preserve data consistency and synchronize the data exchange. The framework has been successfully used to build a distributed robot soccer control system running on as many computers as needed.
Multicasting is a very important operation in high performance parallel applications. Making this operation efficient in supercomputers has been a topic of great concern. Much effort has gone into designing special interconnects to support the operation. Today's huge deployment of NoWs (Network of Workstations) has created a high demand for efficient software-based multicast solutions. These systems are often based on low-cost Ethernet interconnects without direct support for group communication. Basically TCP/IP is the only widely supported method of fast reliable communication, though it is possible to improve Ethernet performance at many levels – i.e., by-passing the operating system or using physical broadcasting. Low-level improvements are not likely to be accepted in production environments, which leaves TCP/IP as the best overall choice for group communication.
In this paper we describe a TCP/IP based multicasting algorithm that uses message segmentation in order to lower the propagation delay. Experiments have shown that TCP is very inefficient when a node has many active connections. With this in mind we have designed the algorithm so that it has a worst-case propagation path length of O(log n) with a minimum of connections per node. We compare our algorithm with the binomial tree algorithm often used in TCP/IP MPI implementations.
Cellular automata (CAs) are good examplesof systems in which large numbers of autonomous entities exhibit emergent behaviour. Using the occam-π and JCSP communicating process systems, we show how to construct “lazy” and “just-in-time” models of cellular automata, which permit very efficient parallel simulation of sparse CA populations on shared-memory and distributed systems.
What if the CSP observerwere lazy? This paperconsidersthe consequences of altering the behavior of the CSP observer. Specifically, what implications would this new behavior have on CSP's traces? Laziness turns out to be a useful metaphor. We show how laziness permits transforming CSP into a model of true concurrency (i.e., non-interleavedtrace semantics). Furthermore, the notion of a lazy observer supports tenets of view-centric reasoning (VCR): parallel events (i.e., true concurrency), multiple observers (i.e., different views), and the possibility of imperfect observation. We know from the study of programming languages that laziness is not necessarily a negative quality; it provides the possibility of greater expression and power in the programswe write. Similarly, within the contextof the UnifyingTheoriesof Programming, a model of true concurrency — VCR — becomes possible by permitting (even encouraging) the CSP observer to be lazy.
This paper introduces the philosophy behind a new Grid model, the Minimum intrusion Grid, MiG. The idea behind MiG is to introduce a 'fat' Grid infrastructure which will allow much 'slimmer' Grid installations on both the user and resource side. This paper presents the ideas of MiG, some initial designs and finally a status report of the implementation.
We describe the first proof system for concurrent programs based on Communicating Sequential Processes for Java (JCSP). The system extends a complete calculus for the JavaCard Dynamic Logic with support for JCSP, which is modeled in terms of the CSP process algebra. Together with a novel efficient calculus for CSP, a rule system is obtained that enables JCSP programs to be executed symbolically and to be checked against temporal properties. The proof system has been implemented within the KeY tool and is publicly available.
In this paper we compare the maximum achievable throughput of different memory organisations of the processing elements that constitute a multiprocessor system on chip. This is done by modelling the mapping of a task with input and output channels on a processing element as a homogeneous synchronous dataflow graph, and use maximum cycle mean analysis to derive the throughput. In a HiperLAN\2 case study we show how these techniques can be used to derive the required clock frequency and communication latencies in order to meet the application's throughput requirement on a multiprocessor system on chip that has one of the investigated memory organisations.
occam-π is a programming language based on the CSP process algebra and the pi-calculus, and has a powerful syntax for expressing concurrency. occam-π does not however, come with interfaces to a broad range of standard libraries (such as those used for graphics or mathematics). Programmers wishing to use these must write their own wrappers using occam-π's foreign function interface, which can be tedious and time consuming. SWIG offers automatic generation of wrappers for libraries written in C and C++, allowing access to these for the target languages supported by SWIG. This paper describes the occam-π module for SWIG, which will allow automatic wrapper generation for occam-π, and will ensure that occam-π's library base can be grown in a quick and efficient manner. Access to database, graphics and hardware interfacing libraries can all be provided with relative ease when using SWIG to automate the bulk of the work.
Thispaperdescribesan extensionto the KRoC occam-πsystem that allows processes programmed in C to participate in occam-π style concurrency. The uses of this are wide-ranging, from providing low-level C processes running concurrently as part of an occam-π network, through to concurrent systems programmed entirely in C. The easily extended API for C processes is based on the traditional Inmos C API, used also by CCSP, extended to cover new features of occam-π. One of the motivations for this work is to ease the developmentof low-level network communication infrastructures. A library that provides for networking of channel-bundles over TCP/IP networks is presented, in addition to initial performance figures.
Grid computing is finally starting to provide solutions for capacity-computing, that is problem solving where there is a large number of independent tasks for execution. This paper describes the experiences with using Grid for capability computing, i.e. solving a single task efficiently. The chosen capability application is driving a very large display which requires enormous processing power due to its huge graphic resolution (7168 x 3072 pixels). Though we use an advanced Grid middleware, the conclusion is that new features are required to provide such coordinated calculations as the present application requires.
In this paper we describe a method for modeling channel-based asynchronous circuits using Verilog HDL. We suggest a method to model CSP-like channels in Verilog HDL. This method also describes nonlinear pipelines and high-level channel timing properties, such as forward and backward latencies, minimum cycle time, and slack. Using Verilog enables us to describe the circuit at many levels of abstraction and to use the commercially available CAD tools.
This paper introduces a safe language binding for CSP multiway events (barriers — both static and mobile) that has been built into occam-π (an extension of the classical occam language with dynamic parallelism, mobile processes and mobile channels). Barriers provide a simple way for synchronising multiple processes and are the fundamental control mechanism underlying both CSP (Communicating Sequential Processes) and BSP (Bulk Synchronous Parallelism). Formal semantics (through modelling in classical CSP), implementation details and early performance benchmarks (16 nanoseconds per process per barrier synchronisation on a 3.2 GHz Pentium IV) are presented, along with some likely directions for future research. Applications are outlined for the fine-grained modelling of dynamic systems, where barriers are used for maintaining simulation time and the phased execution of time steps, coordinating safe and desired patterns of communication between millions (and more) of processes. This work forms part of our TUNA project, investigating emergent properties in large dynamic systems (nanite assemblies).
The concept of exception handling is important for building reliable software. An exception construct is proposed in this paper, which implements an exception handling mechanism that is suitable for concurrent software architectures. The aim of this exception construct is to bring exception handling to a high-level of abstraction such that exception handling does scale well with the complexity of the system. This is why the exception construct supports a CSP-based software design approach. The proposed exception construct embraces informal semantics, but which are intuitive and suitable to software engineering. The exception construct is prototyped in the CSP for Java library, called CTJ.
This paper describes the ongoing development of a new FPGA hosted Transputer using a Load Store RISC style Multi Threaded Architecture (MTA). The memory system throughput is emphasized as much as the processor throughput and uses the recently developed Micron 32MByte RLDRAM which can start fully random memory cycles every 3.3ns with 20ns latency when driven by an FPGA controller. The R16 shares an object oriented Memory Manager Unit (MMU) amongst multiple low cost Processor Elements (PEs) until the MMU throughput limit is reached. The PE has been placed and routed at over 300MHz in a Xilinx Virtex-II Pro device and uses around 500 FPGA basic cells and 1 Block RAM. The 15 stage pipeline uses 2 clocks per instruction to greatly simplify the hardware design which allows for twice the clock frequency of other FPGA processors. There are instruction and cycle accurate simulators as well as a C compiler in development. The compiler can now emit optimized small functions needed for further hardware development although compiling itself requires much work. Some occam and Verilog language components will be added to the C base to allow a mixed occam and event driven processing model. Eventually it is planned to allow occam or Verilog source to run as software code or be placed as synthesized co processor hardware attached to the MMU.
Migrating a thread while preserving its state is a useful mechanism to have in situations where load balancing within applications with intensive data processing is required. Strong mobility systems, however, are rarely developed or implemented as they introduce a number of major challenges into the implementation of the system. This is due to the fact that the underlying infrastructure that most computers operate on was never designed to accommodate such a system, and because of this it actually impedes the development of these systems to some degree. Using a system based around a virtual machine, such as Microsoft's Common Language Runtime (CLR), circumnavigates many of these problems by abstracting away system differences. In this paper we outline the architecture of the threading mechanism in the shared source version of the CLR known as the Shared Source Common Language Infrastructure (SSCLI). We also outline how we are porting strong mobility into the SSCLI, taking advantage of its virtual machine.
gCSP is a graphical tool for creating and editing CSP diagrams. gCSP is used in our labs to generate the embedded software framework for our control systems. As a further extension to our gCSP tool, an occam code generator has been constructed. Generating occam from CSP diagrams gives opportunities to use the Raw-Metal occam eXperiment (RMoX) as a minimal operating system on the embedded control PCs in our mechatronics laboratory. In addition, all processors supported by KRoC can be reached from our graphical CSP tool. The commstime benchmark is used to show the trajectory for gCSP code generation for the RMoX operating system. The result is a simple means for using RMoX in our laboratory for our control systems. We want to use RMoX for future research on distributed control and for performance comparisons between a minimal operating system and our CTC++/RT-linux systems.
Network emulation is a technique that allows real-application performance assessment under controllable and reproducible conditions. We designed and implemented a hardware network emulator on an FPGA-based custom-design PCI platform. Implementation was facilitated by the use of the Handel-C programming language, that allows rapid development and fast translation into hardware and has specific constructs for developing systems with concurrent processes. We report on tests performed with web-browsing applications.