Ebook: Engineering Secure and Dependable Software Systems
Almost all technical systems currently either interface with or are themselves largely software systems. Software systems must not harm their environment, but are also often vulnerable to security attacks with potentially serious economic, political, and physical consequences, so a better understanding of security and safety and improving the quality of complex software systems are crucial challenges for the functioning of society.
This book presents lectures from the 2018 Marktoberdorf summer school Engineering Secure and Dependable Software Systems, an Advanced Study Institute of the NATO Science for Peace and Security Programme. The lectures give an overview of the state of the art in the construction and analysis of safe and secure systems. Starting from the logical and semantic foundations that enable reasoning about classical software systems, they extend to the development and verification of cyber-physical systems, which combine computational and physical components and have become pervasive in aerospace, automotive, industry automation, and consumer appliances. Safety and security have traditionally been considered separate topics, but several lectures in this summer school emphasize their commonalities and present analysis and construction techniques that apply to both.
The book will be of interest to all those working in the field of software systems, and cyber-physical systems in particular.
Almost all technical systems are nowadays in large part software systems themselves or interface with software systems. The ubiquity of software systems requires them not to harm their environment (safety); and at the same time makes them vulnerable to security attacks with potentially considerable economic, political, and physical damage. Better understanding security and safety; improving the general quality of complex software systems (cyber defense and new technologies to support the construction of information technology infrastructure) and the respective development processes and technologies is a crucial challenge for the functioning of society.
Security and safety, or reliability, both are essential facets of the trustworthiness of modern cyber-physical systems. Cyber-physical systems more and more tightly combine and coordinate subsystems consisting of both computational and physical elements. Such systems become indispensable in the domains of aerospace, automotive, industry automation, and consumer appliances. Protecting data within these systems from attacks by external attackers (security), and protecting the environment from the misbehavior of these systems (safety) are two subjects traditionally considered separate. However, a closer look reveals that the techniques for construction and analysis of software-based systems used in both security and safety are not necessarily fundamentally different.
Along these lines, the 2018 Marktoberdorf summer school on software engineering, the 39th of its kind, was concerned with engineering dependable software systems.
JOHN BARAS lectured on Formal Methods and Toolsuites for CPS Security, Safety and Verification.
In these five lectures he presented a general rigorous methodology for model-based systems engineering for cyber-physical systems (CPS), which uses in several key steps traditional and novel formal methods, and more specialized applications and deeper results in several areas.
He first presented a rigorous MBSE methodology, framework and tool-suites for CPS. Advances in Information Technology have enabled the design of complex engineered systems, with large number of heterogeneous components and capable of multiple complex functions, leading to the ubiquitous CPS. He then went on to present several methods addressing the key problem of motion planning and controls with safety and temporal constraints. He described the strengths and weaknesses of each method and provide explicit application examples, and emphasized the key challenge of developing an integrated framework for handling finite temporal and finite space tolerances (requirements, constraints). Finally, he presented several detailed vignettes in: Security and Trust in Networked Systems, Automotive CPS, Stable Path Routing in MANET, Composable and Assured Autonomy. Professor Baras introduced novel formal methods employing various partial ordered semirings, which he used for modeling and evaluating trust and for analyzing multi-metric problems on networks and graphs (multigraphs). He showed an example linking the MBSE methodology to hardware design for automotive controllers. He discussed the challenge and need for composable security and described some initial steps towards achieving this goal.
PATRICK COUSOT lectured on Abstract Interpretation.
He defined the structural rule-based and then fixpoint prefix trace semantics of a simple subset of C. He defined properties, in particular program properties and collecting semantics. Professor Cousot then formalized the abstraction and approximation of program properties and how a structural rule-based/fixpoint abstract semantics can be derived from the collecting semantics by calculational design. He showed that verification methods and program logics are (non-computable) abstractions of the program collecting semantics. Then, he introduced a few classical effectively computable abstractions of numerical properties of programs and discussed industrial static analysis applications. Finally, he introduced a few abstractions of symbolic properties of programs and discuss opened problems. He concluded by a list of formal methods that can be formalized as abstract interpretations.
VIJAY GANESH lectured on SAT and SMT Solvers: A Foundational Perspective.
Over the last two decades, software engineering (broadly construed to include verification, testing, analysis, synthesis, security) has witnessed a silent revolution in the form of SAT and SMT solvers. These tools are now integral to many analysis, synthesis, verification, and testing approaches. In his lectures, Professor Ganesh traced the important technical developments that underpin SAT and SMT solver technology, provided a detailed explanation of how they work, provided a proof complexity-theoretic view of solvers as proof systems, and described how users can get the most out of these powerful tools.
SUMIT GULWANI lectured on Programming by Examples.
Programming by Examples (PBE) involves synthesizing intended programs in an underlying domain-specific language (DSL) from example-based specifications. PBE is set to revolutionize the programming experience for both developers and end users. It can provide a 10-100x productivity increase for developers in some task domains, and can enable computer users, 99% of whom are non-programmers, to create small scripts to automate repetitive tasks. Two killer applications for this technology include data wrangling (an activity where data scientists today spend 80% time) and code refactoring (an activity where developers spend up to 40% time in a typical application migration scenario).
Dr. Gulwani discussed some principles behind designing useful DSLs for program synthesis. A key technical challenge in PBE is to search for programs in the underlying DSL that are consistent with the examples provided by the user. He discussed a divide-and-conquer based search paradigm that inductively reduces the problem of synthesizing a program with a certain top-level operator to simpler synthesis problems over its sub-programs by leveraging the operator's inverse semantics. Another challenge in PBE is to resolve the ambiguity in the example-based specification. He discussed two complementary approaches: (a) ranking techniques that can pick an intended program from among those that satisfy the specification, and (b) active-learning based user interaction models. The various concepts were illustrated using Flash Fill, FlashExtract, and FlashRelate—PBE technologies for data manipulation domains. The Microsoft PROSE SDK allows easy construction of such technologies. He did a hands-on exercise that involved building a synthesizer for a small part of the Flash Fill DSL using the PROSE framework.
ARIE GURFINKEL lectured on Algorithmic Logic-based Verification.
Developing an automated program verifier is an extremely difficult task. By its very nature, a verifier shares many of the complexities of an optimizing compiler and of an efficient automated theorem prover. From the compiler perspective, the issues include idiomatic syntax, parsing, intermediate representation, static analysis, and equivalence preserving program transformations. From the theorem proving perspective, the issues include verification logic, verification condition generation, synthesizes of sufficient inductive invariants, deciding satisfiability, interpolation, and consequence generation. Luckily, the cores of both compilers and theorem provers are well understood, well-defined, and readily available. In these lectures, Professor Gurfinkel examined how to build a state-of-the-art program verifier by re-using much of existing compilers and SMT-solvers. The lectures were based on the SeaHorn verification framework developed in collaboration between University of Waterloo and SRI International.
JOSEPH HALPERN lectured on “An Epistemic Foundation for Authentication Logics”, “A knowledge-based analysis of the blockchain protocol,” and finally “Knowledge and common knowledge in a distributed environment.”
RUPAK MAJUMDAR lectured on Formal Methods for Software Controlling the Physical World.
A cyber-physical system (CPS) integrates computation and communication with the control of physical processes. CPSs are ubiquitous: for example, automotive control systems, medical devices, energy grids, etc. are all examples of CPSs. An important question is to come up with a cost effective yet high confidence design and verification methodology for these systems. This is a very broad problem. In this sequence of lectures, Professor Majumdar focused on one particular aspect: design of reactive controllers for CPSs from declarative specifications. In particular, he described the basis of a control design technique called abstraction-based control design (ABCD). In ABCD, one abstracts a continuous-state, time-sampled dynamical system into a finite 2-person game and uses reactive synthesis algorithms from finite-state games. A strategy extracted from the finite game can be refined to a controller for the original system. He described the basics of reactive synthesis, starting with basic algorithms for safety and reachability games, and going on to describing general algorithms using the mu-calculus, showing how abstraction works, and how ABCD can be optimized using multi-level abstractions. Along the way, he showed connections to foundational questions in logic and automata theory.
PETER MÜLLER lectured on Building Deductive Program Verifiers.
Deductive program verifiers attempt to construct a proof that a given program satisfies a given specification. Their implementations reflect the semantics of the programming language and the specification language, and often include elaborate proof search strategies to automate verification. Each of these components is intricate, which makes building a verifier from scratch complex and costly.
In this lecture series, Professor Müller presented an approach to build program verifiers as a sequence of translations from the source language and specification via intermediate languages down to a logic for which automatic solvers exist. This architecture reduces the overall complexity by dividing the verification process into simpler, well-defined tasks, and enables the reuse of essential elements of a program verifier such as parts of the proof search, specification inference, and counterexample generation. He introduced intermediate verification languages and demonstrate how they can encode interesting verification problems.
CATHERINE MEADOWS lectured on Maude-NPA and Formal Analysis of Cryptographic Protocols With Equational Theories.
Formal analysis of cryptographic protocols has been one of the most successful applications of formal methods to security. It has played a prominent part in the development and validation of security standards, shown most recently by use of formal methods in the analysis of TLS 1.3 at the invitation of the TLS 1.3 developers.
One long-standing problem in the analysis of cryptographic protocols is reasoning about cryptosystems that satisfy equational properties, such as Diffie-Hellman and exclusive-or. In these cases the equational properties are necessary both to understand potential vulnerabilities, and to correctly represent the actions of the protocol. However, such equational properties can be difficult to incorporate so that the analysis is both tractable and sound. This is especially the case when the properties include associative-commutative rules. The Maude-NRL Protocol Analyzer (Maude-NPA) is a tool that takes a systematic approach to reasoning about protocols that rely on such properties, and it has had a major influence on work in the field.
This course consisted of two parts: an introduction to formal cryptographic protocol analysis, with emphasis on reasoning about equational properties, and an in-depth presentation on the Maude-NPA cryptographic protocol analysis tool, showing the theoretical and heuristic approaches it applies to reasoning about equational properties.
ANNABELLE MCIVER lectured on Qualitative and quantitative information flow with applications to security.
In this series, she described how to use abstraction and refinement to reason about confidentiality properties in sequential programs. First, a simple model based on Kripke structures was used to give a qualitative semantics to determine whether secret information is leaked inadvertently; those ideas were then generalised to enable the quantitative analysis of how much any leaked information can be used by an adversary.
Throughout the lectures the emphasis was on reasoning rules and the comparison of programs (and specifications) using a refinement order which maintains both functional consistency as well as information flow properties. The approach was illustrated by a number of well-known case studies drawn from the security literature.
MARC POUZET lectured on Synchronous Programming of Cyber-physical Systems.
Synchronous programming has enjoyed great success in the design and implementation of the most critical control software, e.g., aircraft (fly-by-wire, engine control, braking), railways (on board control, interlocking), nuclear plants (control software). Many of these applications are developed with the language SCADE, founded on pioneering work in Lustre, with key additions from Esterel and Lucid Synchrone.
Lustre is a domain specific language for programming the block diagrams that abound in engineering disciplines. It formalizes components as synchronous functions over streams (infinite sequences) expressed by recursively defined sets of dataflow equations. Models can be simulated and formally verified before being automatically compiled into code guaranteed to execute in bounded time and memory. This idea of ‘model based design’, radical for the time, is today at the heart of widely used industrial tools like Simulink.
This lecture presented the foundations of synchronous programming. Professor Pouzet came back to the origin of Lustre, its links with other stream language and the interest of a synchronous interpretation. He presented its semantics and modular compilation to software and show how it can be specified and verified formally in an interactive theorem prover (Coq). The lecture explained how the subtle mix of stream equations and hierarchical automata was designed and its integration into SCADE. He showed how type theory can be applied to express several safety properties and ensure a compilation into statically scheduled code for execution in finite memory, and also how those type systems can be extended for applications with multiple execution rates. Finally, he described the design, semantics and implementation of Zélus, where discrete controllers can be composed with continuous-time models of a physical environment.
ALEXANDER PRETSCHNER lectured on Accountability.
Accountability is the property of a system to help answer questions regarding why specific events have happened. With accountability infrastructures in place, identified reasons can be used to improve systems and to assign blame. Accountability rests on two pillars, monitoring and causality analyses. In this series of lectures, he focused on causality analyses and respective underlying models and showed their wide applicability in computer science. He considered causality analyses including spectrum-based fault localization, Granger causality analysis, model-based diagnosis, and focused on SAT-based and ILP-based approaches to counterfactual reasoning on the grounds of Halpern and Pearls notion of actual causality inference. Professor Pretschner also discussed the provenance of causal models as fault trees, attack trees, and explicit acyclic equations.
We thank all the lecturers, the staff, and hosts in Marktoberdorf. Specifically, Traudl Fichtl was once again instrumental in organizing and running the school. She helped make the 2018 Marktoberdorf summer school a most rewarding experience, both academically and personally.
The Editors
We summarize the material presented in our five lectures at the 2018 Marktoberdorf International Summer Schools on Engineering Secure and Dependable Software Systems. In these five lectures we presented a general rigorous methodology for model-based systems engineering for cyber-physical systems, which uses in several key steps traditional and novel formal methods and more specialized applications and deeper results in several areas.
We introduce basic concepts of abstract interpretation using the example of arithmetic and boolean expression semantics, properties, verification, static analysis, and their formal calculational design.
Over the last two decades, we have witnessed a silent revolution in software engineering thanks in part to Boolean SAT and SMT solvers. These tools have made it significantly easier to design, build, and maintain myriad kinds of program analysis, testing, synthesis and verification systems. Further, they have enabled new software engineering methods that were otherwise deemed infeasible. In these lecture notes, we provide an introductory overview of SAT and SMT solvers, from both practical as well as theoretical points of view. Specifically, we describe in detail the architecture of the DPLL, CDCL, and SMT algorithms. Further, we provide theoretical understanding of these solvers through the lens of proof complexity. Finally, we showcase the power of machine learning techniques to fundamentally transform solver design.
Programming by Examples (PBE) involves synthesizing intended programs in an underlying domain-specific programming language (DSL) from example-based specifications. This new frontier in AI enables computer users, 99% of whom are non-programmers, to create scripts to automate repetitive tasks. PBE can provide 10-100x productivity increase for data scientists, business users, and developers for various task domains like string/number/date transformations, structured table extraction from log files/web pages/PDF/semi-structured spreadsheets, transforming JSON from one format to another, repetitive text editing, repetitive code refactoring and formatting. PBE capabilities can be surfaced using GUI-based tools, code editors, or notebooks, and the code can be synthesized in various target languages like Java or even PySpark to facilitate efficient execution on big data.
There are three key components in a PBE system. (i) A search algorithm that can efficiently search for programs that are consistent with the examples provided by the user. We leverage a divide-and-conquer based deductive search paradigm that inductively reduces the problem of synthesizing a program expression of a certain kind that satisfies a given specification into sub-problems that refer to subexpressions or sub-specifications. (ii) Program ranking techniques to pick an intended program from among the many that satisfy the examples provided by the user. (iii) User interaction models to facilitate usability and debuggability.
Each of these PBE components leverage both symbolic reasoning and heuristics. We make the case for synthesizing these heuristics from training data using appropriate machine learning methods. In particular, we use neural-guided heuristics to resolve any resulting non-determinism in the search process. Similarly, our ML-based ranking techniques, which leverage features of program structure and program outputs, are often able to select an intended program from among the many that satisfy the examples. Finally, Our active-learning-based user interaction models, which leverage clustering of input data and semantic differences between multiple synthesized programs, facilitate a bot-like conversation with the user to aid usability and debuggability. That is our algorithms that deeply integrate neural techniques with symbolic computation can not only lead to better heuristics, but can also enable easier development, maintenance, and even personalization of a PBE system.
In this paper, we present SEAHORN, a software verification framework. The key distinguishing feature of SEAHORN is its modular design that separates the concerns of the syntax of the programming language, its operational semantics, and the verification semantics. SEAHORN encompasses several novelties: it (a) encodes verification conditions using an efficient yet precise inter-procedural technique, (b) provides flexibility in the verification semantics to allow different levels of abstraction, (c) uses Horn-clauses as an intermediate language to represent verification conditions which simplifies interfacing with multiple verification tools based on Horn-clauses, and (d) leverages the state-of-the-art in software model checking and abstract interpretation for verification. SEAHORN provides users with a powerful verification tool and researchers with an extensible and customizable framework for experimenting with new software verification techniques.
Abstraction-based control design (ABCD) is a methodology to algorithmically construct controllers for continuous time dynamical systems for temporal specifications. The ABCD algorithm constructs a finite and discrete-time abstraction of the dynamical system, computes a discrete controller for the finite system using reactive synthesis, and refines the discrete controller to a controller for the original dynamical system. These lecture notes lay out the basic theory for ABCD and describe a recent extension which performs the abstraction in a multi-layered, adaptive way.
Chaum's Dining Cryptographers protocol crystallises the essentials of security just as other famous diners once captured deadlock and livelock: it is a benchmark for security models and their associated verification methods.
Here we give a correctness proof of the Cryptographers in a new style, one in which stepwise refinement plays a prominent role. Furthermore, our proof applies to arbitrarily many Diners: to our knowledge we are only the second group to have done that.
The proof is based on the Shadow Security Model which integrates non-interference and program refinement: with it, we try to make a case that stepwise development of security protocols is not only possible but actually is quite a good idea. It benefits from more than three decades' of experience of how layers of abstraction can both simplify the design process and make its outcomes more likely to be correct.
This paper describes the latest version (3.x) of the Maude-NRL Protocol Analyzer (Maude-NPA), a tool for the analysis of cryptographic protocols that provides support for equational theories. A key feature is it supports reasoning about a class of theories with disjoint compositions that have the finite variant property, via the use of a unification algorithm called variant unification. These theories include exclusive-or, Abelian groups, and theories underlying Diffie-Hellman key exchange. In this paper we show how Maude-NPA works, and we explain in detail the variant unification algorithm it uses and how the tool makes use of it. We also describe work on another form of unification, asymmetric unification, that has the potential to improve on the performance of variant unification in some cases.
Deductive program verifiers attempt to construct a proof that a given program satisfies a given specification. Their implementations reflect the semantics of the programming language and the specification language, and often include elaborate proof search strategies to automate verification. Each of these components is intricate, which makes building a verifier from scratch complex and costly.
In these lecture notes, we will present an approach to build program verifiers as a sequence of translations from the source language and specification via intermediate languages down to a logic for which automatic solvers exist. This architecture reduces the overall complexity by dividing the verification process into simpler, well-defined tasks, and enables the reuse of essential elements of a program verifier such as parts of the proof search, specification inference, and counterexample generation. We will use the intermediate verification language Viper to demonstrate how to encode interesting verification problems.
The language Lustre was introduced to design and implement real-time control software, modeling it as a continuous function over treams of datas. A set of equations written in Lustre defines a restricted class of Kahn process networks which can be executed synchronously: all computations can be dated according to a global time scale so that when a value is produced, it is immediately consumed. This restriction is obtained by associating to every stream a clock that defines when a value is present or not according to a global time scale. A dedicated type system — the clock calculus — computes a clock for every expression and checks that its actual clock equals its expected clock and thus that intermediate buffers are not needed.
In these course notes, we present a static and dynamic semantics of synchronous Kahn networks. We consider a first-order functional language of streams reminiscent of Lustre and Lucid Synchrone to which we give several denotational semantics. We show that without imposing restrictions, we get two kinds of bad behavior: some networks may deadlock and some cannot execute without unbounded FIFOs. We introduce a clocked semantics and show that the clocking rules correspond to a type system with dependent types. We then extend the language kernel with an explicit buffer operator to model communication through a FIFO. The clock calculus is extended with a subtyping rule that is applied where the buffer is used and whose size is inferred. To reduce the complexity of the resolution, we present an abstraction of clocks.
Recent formal approaches towards causality have made the concept ready for incorporation into the technical world. However, causality reasoning is computationally hard; and no general algorithmic approach exists that efficiently infers the causes for effects. Thus, checking causality in the context of complex, multi-agent, and distributed socio-technical systems is a significant challenge. Therefore, we conceptualize an intelligent and novel algorithmic approach towards checking causality in acyclic causal models with binary variables, utilizing the optimization power in the solvers of the Boolean Satisfiability Problem (SAT). We present two SAT encodings, and an empirical evaluation of their efficiency and scalability. We show that causality is computed efficiently in less than 5 seconds for models that consist of more than 4000 variables.