Ebook: Software Systems Safety
Until quite recently, the correctness and security of software systems was a largely theoretical problem relevant only for a small group of computer specialists. Today it is a fundamental problem for society at large, with security breaches in banking software, malware attacks and bugs in programs affecting millions of people and making the headlines almost daily.
The computer science community is developing verification and synthesis tools which will mechanize ever more tasks in the design of secure programs. This book presents the papers delivered at the NATO Advanced Study Institute (ASI) Summer School Marktoberdorf 2013 – Software Systems Safety. The participants represented research groups from both industry and academia, and the subjects covered included: software model checking via systematic testing, program synthesis, E voting systems, probabilistic model checking in biology, infinite state model checking, Boolean satisfiability, interactive proof, and software security by information flow control.
The Marktoberdorf Summer School is one of the most renowned international computer science summer schools, and this book, with its detailed overview of current research results with special emphasis on the solving of software systems security problems, will be of interest to all those whose work involves systems security.
Only 20 years ago the correctness and security of software systems was a largely theoretical problem, relevant only for a small group of computer science specialists. Today it has become a fundamental problem of society at large: software bugs in programs used by hundred million people, security breaches in banking software, and malware attacks compromising millions of computers make it to the headlines almost daily. The computer science community is attacking the problem by developing verification and synthesis tools that mechanize more and more tasks in the design of correct and secure programs. The School provided courses about verification and synthesis tools, methodologies (testing, model-checking, synthesis and information-flow analysis), and adaptation and tailoring to application domains.
The lecturers of the NATO Advanced Study Institute “Summer School Marktoberdorf 2013 – Software Systems Safety“ represented research groups from industry as well as universities and showed a detailed and excellent overview on current research results with special emphasis to solve such problems in software systems security.
Software Model Checking via Systematic Testing was the title of PATRICE GODEFROID’s lecture. Over the last two decades, significant progress has been made on how to broaden the scope of model checking from finite-state abstractions to actual software implementations. Adapting model checking into a form of systematic testing that is applicable to industrial-size software might be a solution. His lectures presented a comprehensive overview of this strand of software model checking, by describing the main ideas, techniques and results obtained in this area, including combinations with static program analysis.
ORNA GRUMBERG lectured on SAT-Based Model Checking: Interpolation, IC3 and Beyond. Model checking is an automatic approach to formally verifying that a given system satisfies a given specification. In spite of its great success in verifying hardware and software systems, the applicability of model checking is impeded by its high space and time requirements. We surveyed several approaches to enhancing SAT-based model checking. All these approaches result in efficient and complete SAT-based verification algorithms.
SUMIT GULWANI explained that Program Synthesis is the task of searching for a program in some underlying domain-specific language that matches the user’s intent. His lectures described three key dimensions in program synthesis. The first one was underlying domain-specific language in which programs are synthesized. Second he talked about user intent specification mechanism and interaction model to resolve ambiguities in under-specification. The third topic were search techniques. These dimensions were illustrated via a variety of applications in algorithm discovery (e.g., Bitvector/SIMD/graph algorithms, program inverses) and end-user programming (e.g., Spreadsheet programs, and smartphone scripts). Some surprising applications in the area of computer-aided education were presented (including problem synthesis, solution synthesis, feedback synthesis, and content authoring for a variety of subject domains including math, logic, and programming).
Interactive Proof: Applications to Data Flow Analysis and Security was GERWIN KLEIN’s lecture which was built on the previous introduction to interactive proof. He demonstrated a specific application of interactive proof assistants: the semantics of programming languages. In particular, he taught how to formalize a small imperative programming language in the theorem prover Isabelle/HOL and how to define its semantics. We covered a basic type system with type safety proof, a more complex security system also with soundness proof, and different kinds of data flow analysis, such as liveness analysis.
RALF KÜSTERS’ lecture was on E-Voting Systems. Systems for electronic voting (e-voting), including systems for voting in a voting booth, are employed in many countries and numerous problems with e-voting systems have been reported. Most of the systems used in practice today do not provide a sufficient level of security. Programming errors and malicious behaviour easily go undetected. Ralf covered central security requirements of e-voting systems, and how they can be formally defined and analysed. While analysis is mostly done based on cryptographic models or even more abstract so-called Dolev-Yao models, he also discussed approaches to perform (cryptographic) analysis directly on the implementation-/language-level of a system.
The lecture of MARTA KWIATKOWSKA held was titled Probabilistic Model Checking for Biology. Probabilistic model checking is an automated method for verifying the correctness and performance of probabilistic models. Property specifications are expressed in probabilistic extensions of temporal logic, and the probabilistic model checker can compute the probability of a given event, of its occurrence within a given time interval, or expected cost reward. Discrete- and continuous-time Markov chains were introduced, as well as the logics PCTL and CSL, and the corresponding model checking algorithms. An application of probabilistic model checking to two biological examples was shown: molecular signalling pathways and DNA computation. Marta demonstrated how design errors can be automatically detected in DNA gate designs using probabilistic model checking, analogously to model checking for digital circuits.
In the set of lectures on Infinite-state Model Checking by RUPAK MAJUMDAR looked at the problem of automated analysis of infinite-state systems. Infinite-state systems arise in many program verification problems, due to many different features in the models. For example, systems may contain unbounded data values (integers), data structures (lists, arrays, etch.), real-valued clocks or other physical variables, unbounded execution stacks, or unboundedly many threads of execution. The purpose of these lectures was to understand how techniques from finite-state model checking can be extended to the infinite-state case. He showed both heuristic techniques (for special cases that are decidable). The first part dealt with abstraction of systems and counterexample-guided refinement. He showed how states can be represented as logical constraints and how decision procedures for logical theories can be used to compute finite-state abstractions of systems. He also presented a model checking algorithm based on abstraction and refinement. The second part focused on decidability results for infinite-state systems. He showed a general decidability result for well-structured transition systems using Petri nets. Also an algorithm to verify recursive programs with finite data domains was taught. Finally, he considered asynchronous programs, a programming model including both recursion and unbounded task creation for which safety and liveness properties remain decidable.
Boolean Satisfiability: Solvers and Extensions was the lecture given by SHARAD MALIK. Boolean Satisfiability (SAT) is the problem of checking if a propositional logic formula can ever evaluate to true. Several relatively recent research developments have enabled us to tackle formulas with millions of variables and constraints enabling SAT solvers to be effectively deployed in practical applications including in the analysis and verification of software systems. In the first part of this series of lectures, he covered techniques used in modern SAT solvers. In the second part, he considered extension of these solvers that have proved to be useful in analysis and verification. He showed how these concepts are related, present algorithms to derive them and show their application in design debugging.
The course Interactive Proof: Hands-on Introduction by TOBIAS NIPKOW introduced interactive theorem proving with the Isabelle/HOL system in the following 3 steps. The first one was verified functional programming. The logic HOL contains an ML-style functional programming language. It was shown how to verify functional programs in this language by induction and simplification. The second topic was predicate logic. Formulas of predicate logic and set theory were introduced together with inductively defined predicates. In a third step we talked about structured proofs. The proof language Isar was introduced and it was shown how to write structured proofs that are readable by both the machine and the human. The course assumed basic familiarity with some functional programming language of the ML or Haskell family, in particular with recursive data types and pattern matching. No specific background in logics is necessary beyond the ability to read predicate logic formulas.
DAVID SANDS held a lecture on Software Security by Information Flow Control. A long-term goal is to construct innovative design methods for the construction of secure systems that put security requirements at the heart of the construction process, namely security by design. But to do this we had to understand how to unambiguously formulate the policy aims for secure systems and to develop a technology to integrate these goals into design mechanisms and technologies that enables an efficient construction or verification of systems with the respect to those policies. He managed to better understand the semantics of information flow, explore dynamic and state-dependent policies, and to see how these ideas can be incorporated into a programming language with the support for static verification of a rich variety of information flow policies.
HELMUT SEIDL talked about A General Infrastructure for Interprocedural Analysis of Concurrent C. His tutorial was about infrastructures for general-purpose interprocedural analyses. It consisted of two parts. At first he argued that side-effecting constraint systems might serve as kind of a Swiss army knife for specifying analyses, while secondly he provided an overview on solving techniques for such systems. He reported on techniques for solving side-effecting constraint systems. One major issue here was that non-trivial analysis problems require complete lattices with infinite ascending and descending chains. In order to compute reasonably precise post-fixpoints of the resulting systems of equations, Cousot and Cousot have suggested accelerated fixpoint iteration by means of widening and narrowing.
Practical Program Analysis and Synthesis was the lecture by ERAN YAHAV. For system-level programmers, the transition of hardware providers to multi-core architectures exposes new sources of complexity. Additional complexity is introduced by systems using heterogeneous concurrency and massively data-parallel architectures such as GPUs. For application-level programmers, the proliferation of libraries and frameworks, intended to reduce complexity, often requires programmers to be aware of intricate library internals for effective and correct usage of the library. Furthermore, despite the ability to hide some aspects of concurrency in the library, even application-level programmers might still need to reason about atomicity. Eran taught a framework for synthesizing efficient synchronization in concurrent programs, a task known to be difficult and error-prone when done manually. Given a program, a specification, and an abstraction, we inferred synchronization that avoids all (abstract) interleaving that may violate the specification, but permits as many valid interleaving as possible.
Over the years, the Marktoberdorf Summer School has facilitated its scientifically profitable and well-known tradition and will follow future challenges in formal computer science to ensure cyber security. The Summer School provides two weeks of challenging and fruitful learning, discussion and development of new ideas, and should be a productive and beneficial event, at both the professional and the social level.
From the very first, the ASI “Marktoberdorf” was planned as a periodical forum with leading researchers to teach, learn and discuss the state-of-the-art research with the target to build a special international community. Out series of ASIs has been proved as a high-level scientific nucleus for international scientific networking and is settled as one of the most famous International Computer Science Summer Schools – shown in an impressive way by the high number of applications every year.
We thank all lecturers, the staff, and hosts in Marktoberdorf. Special thanks must be addressed to Dr. Katharina Spies and Silke Müller for their great support.
The Marktoberdorf Summer School was arranged as an Advanced Study Institute of the NATO Science for Peace and Security Programme with support from Deutscher Akademischer Austausch Dienst (DAAD). We thank all authorities involved.
THE EDITORS
Three-valued models, in which properties of a system are either true, false or unknown, have recently been advocated as a better representation for reactive program abstractions generated by automatic techniques such as predicate abstraction. Indeed, for the same cost, model checking three-valued abstractions, also called may/must abstractions, can be used to both prove and disprove any temporal-logic property, whereas traditional conservative abstractions can only prove universal properties. Also, verification results can be more precise with generalized model checking, which checks whether there exists a concretization of an abstraction satisfying a temporal-logic formula. Generalized model checking generalizes both model checking (when the model is complete) and satisfiability (when everything in the model is unknown), probably the two most studied problems related to temporal logic and verification.
This paper presents an introduction to the main ideas behind this framework, namely models for three-valued abstractions, completeness preorders to measure the level of completeness of such models, three-valued temporal logics and generalized model checking. It also discusses algorithms and complexity bounds for three-valued model checking and generalized model-checking for various temporal logics. Finally, it discusses applications to program verification via automatic abstraction.
SAT-based model checking is currently one of the most successful approaches to checking very large systems. In its early days, SAT-based (bounded) model checking was mainly used for bug hunting. The introduction of interpolation and IC3\PDR enable efficient complete algorithms that can provide full verification as well.
In this paper, we survey several approaches to enhancing SAT-based model checking. They are all based on iteratively computing an over-approximation of the set of reachable system states. They use different mechanisms to achieve scalability and faster convergence.
The first one uses interpolation sequence rather than interpolation in order to obtain a more precise over-approximation of the set of reachable states. The other approach integrates lazy abstraction with IC3 in order to achieve scalability. Lazy abstraction, originally developed for software model checking, is a specific type of abstraction that allows hiding different model details at different steps of the verification. We find the IC3 algorithm most suitable for lazy abstraction since its state traversal is performed by means of local reachability checks, each involving only two consecutive sets. A different abstraction can therefore be applied in each of the local checks.
The survey focuses on hardware model checking, but the presented ideas can be extended to other systems as well.
Program Synthesis is the task of searching for programs over some underlying space that realize user's intent. There are three key dimensions in program synthesis: expression of user intent, space of programs over which to search, and the search technique. This article illustrates these dimensions while taking an application centric view.
The traditional application of program synthesis has been in synthesizing tricky programs such as bitvector algorithms to help software developers or algorithm designers. After an initial discussion of this traditional application, we discuss some recent applications of program synthesis techniques to helping end-users with little or no programming background. In particular, we discuss techniques for automating a variety of simple repetitive tasks in spreadsheets using examples.
We then discuss a rather surprising application of synthesis techniques to computer-aided Education including tasks such as problem synthesis, solution synthesis, and feedback synthesis. We illustrate these tasks by means of applications to a variety of subject domains ranging from arithmetic, algebra, geometry, programming, logic, and automata theory.
We show how to formalise a small imperative programming language in the theorem prover Isabelle/HOL, how to define its semantics, and how to prove properties about the language, its type systems, and a number of data flow analyses.
The emphasis is not on formalising a complex language deeply, but to teach a number of formalisation techniques and proof strategies using simple examples. For this purpose, we cover a basic type system with type safety proof, more complex security type systems, also with soundness proofs, and different kinds of data flow analyses, in particular definite initialisation analysis and constant propagation, again with correctness proofs.
Systems for electronic voting (e-voting systems), including systems for voting over the Internet and systems for voting in a voting booth, are employed in many countries. However, most of the systems used in practice today do not provide a sufficient level of security. For example, programming errors and malicious behavior easily go undetected. In fact, numerous problems with e-voting systems have been reported in various countries.
Therefore, in recent years modern e-voting systems have been designed that strive to achieve a rich set of fundamental but at the same time intricate and seemingly contradictory security requirements. For example, besides keeping the votes of individual voters private (privacy of votes), they allow voters to check that their votes were counted correctly, even if voting machines have programming errors or are outright malicious (verifiability/accountability). Some of these systems also try to prevent vote buying and voter coercion (coercion resistance).
In this course, we cover the above mentioned central security requirements of e-voting systems and how they can be formally defined and analyzed. While analysis is mostly done based on cryptographic models or even more abstract so-called Dolev-Yao models, we also discuss approaches to perform (cryptographic) analysis directly on the implementation-/language-level of a system.
Probabilistic model checking is an automated method for verifying the correctness and performance of probabilistic models. Property specifications are expressed in probabilistic temporal logic, denoting, for example, the probability of a given event, the probability of its occurrence within a given time interval, or expected number of times it has occurred in a time period. This chapter focuses on the application of probabilistic model checking to biological systems modelled as continuous-time Markov chains, illustrating the usefulness of these techniques through relevant case studies performed with the probabilistic model checker PRISM. We begin with an introduction to discrete-time Markov chains and the corresponding model checking algorithms. Then continuous-time Markov chain models are defined, together with the logic CSL (Continuous Stochastic Logic), and an overview of model checking for CSL is given, which proceeds mainly by reduction to discrete-time Markov chains. The techniques are illustrated with examples of biochemical reaction networks, which are verified against quantitative temporal properties. Next a biological case study analysing the Fibroblast Growth Factor (FGF) molecular signalling pathway is summarised, highlighting how probabilistic model checking can assist in scientific discovery. Finally, we consider DNA computation, and specifically the DSD formalism (DNA Strand Displacement), and show how errors can be detected in DNA gate designs, analogous to model checking for digital circuits.
These lecture notes give an introduction to the field of infinite-state model checking. We take a language-theoretic view, and focus on a few foundational results.
Contemporary satisfiability solvers are the corner-stone of many successful applications in domains such as automated verification and artificial intelligence. The impressive advances of SAT solvers, achieved by clever engineering and sophisticated algorithms, enable us to tackle Boolean Satisfiability (SAT) problem instances with millions of variables – which was previously conceived as a hopeless problem. We provide an introduction to contemporary SAT-solving algorithms, covering the fundamental techniques that made this revolution possible. Further, we present a number of extensions of the SAT problem, such as the enumeration of all satisfying assignments (ALL-SAT) and determining the maximum number of clauses that can be satisfied by an assignment (MAX-SAT). We demonstrate how SAT solvers can be leveraged to solve these problems. We conclude the chapter with an overview of applications of SAT solvers and their extensions in automated verification.
This tutorial provides an introduction to Paragon, a programming language that allows specification of security concerns on the data that is manipulated, as an integral part of the code. The target audience are programmers, as the end users of the programming language. The goal of the tutorial is to learn how to write information-flow secure programs in Paragon, without prior knowledge of theories, enforcement mechanisms and meta-properties underlying the field of information flow research.
This tutorial is about infra-structures for general-purpose interprocedural analyses. It consists of two parts. The first part, following the lines of [2], argues that side-effecting constraint systems may serve as kind of swiss army knife for specifying analyses, while the second part provides an overview on solving techniques for such systems. Side-effecting constraint systems were originally introduced for the analysis of multi-threaded code by Müller-Olm, Seidl and Vene in [41]. Here, we explain how this formalism provides a unified framework for realizing efficient interprocedural analyses of programs, possibly with dynamic function calls, where the amount of context-sensitivity can be tweaked and where the context-sensitive analyses of local properties can be combined with flow-insensitive analyses of global properties, e.g., about the heap. One infrastructure realizing this intermediate format, is the analyzer generator GOBLINT, which we use to practically evaluate this approach on real-world examples.
The second part, following [3], reports on techniques for solving side-effecting constraint systems. One major issue here is that non-trivial analysis problems require complete lattices with infinite ascending and descending chains. In order to compute reasonably precise post-fixpoints of the resulting systems of equations, Cousot and Cousot have suggested to accelerate fixpoint iteration by means of widening and narrowing [10, 14]. The strict separation into a widening phase followed by a narrowing phase, however, may unnecessarily give up precision that cannot be recovered later. While widening is also applicable if equations are non-monotonic, this is no longer the case for narrowing. A narrowing iteration to improve a given post-fixpoint, additionally, must assume that all right-hand sides are monotonic. The latter assumption, though, is not met in presence of widening. It is also not met by constraint systems corresponding to context-sensitive interprocedural analyses, possibly combining context-sensitive analysis of local information with flow-insensitive analysis of globals. As a remedy, we present a novel operator that combines a given widening operator with a given narrowing operator. We present adapted versions of round-robin as well as of worklist iteration, local, and side-effecting solving algorithms for the combined operator
Software is becoming increasingly complex. For system-level programmers, the transition of hardware providers to multi-core architectures exposes new sources of complexity. Additional complexity is introduced by systems using heterogenous concurrency and massively data-parallel architectures such as GPUs. For application-level programmers, the proliferation of libraries and frameworks, intended to reduce complexity, often requires programmers to be aware of intricate library internals for effective and correct usage of the library. Furthermore, despite the ability to hide some aspects of concurrency in the library, even application-level programmers might still need to reason about atomicity.
Despite significant progress in automatic checking and verification tools, such tools can only be applied after the code is written and may be broken in a fundamental manner. This motivates us to explore practical software synthesis techniques that assist a programmer during the development process. In this chapter, we survey two lines of work: one that addresses synthesis of synchronization and assists system-level programmers, and another that targets application-level programmers and leverages millions of examples to simplify programming with libraries.