Ebook: Dependable Software Systems Engineering
In the last few years we have all become daily users of Internet banking, social networks and cloud services. Preventing malfunctions in these services and protecting the integrity of private data from cyber attack are both current preoccupations of society at large. While modern technologies have dramatically improved the quality of software, the computer science community continues to address the problems of security by developing a theory of formal verification; a body of methodologies, algorithms and software tools for finding and eliminating bugs and security hazards.
This book presents lectures delivered at the NATO Advanced Study Institute (ASI) School Marktoberdorf 2015 – ‘Verification and Synthesis of Correct and Secure Systems'. During this two-week summer school, held in Marktoberdorf, Germany, in August 2015, the lecturers provided a comprehensive view of the current state-of-the-art in a large variety of subjects, including: models and techniques for analyzing security protocols; parameterized verification; synthesis of reactive systems; software model checking; composition checking; programming by examples; verification of current software; two-player zero-sum games played on graphs; software security by information flow; equivalents – combinatorics; and analysis of synthesis with 'Big Code'.
The Marktoberdorf ASIs have become a high-level scientific nucleus of the international scientific network on formal methods, and one of the major international computer science summer schools. This book will be of interest to all those seeking an overview of current theories and applications in formal verification and security.
In the last years average citizens have become daily users of internet banking, social networks, and cloud services. Preventing malfunction of these services and protecting the integrity of private data from cyber-attacks has become one of the main challenges of society at large. Computer science is not only in charge of developing new technologies and systems to face this challenge, but also their scientific foundations. Approaches and technologies developed over the past years dramatically improve the quality of software. The computer science community is attacking the problem named above by developing verification and synthesis tools that mechanize more and more tasks in the design of safe and secure programs. Moreover, progress is being made in understanding the fundamentals, for example, in testing (efficient safety check of single program executions), in model-checking (more expensive, complete check of software models), in synthesis (development of software correct by construction), or in information-flow analysis (detection of information).
During the last decades computer scientists have developed, and continue to develop, a theory of formal verification: a body of methodologies, algorithms, and software tools for finding bugs or security hazards in existing software, or for ensuring, with the rigor of mathematical proof, that they do not exist. The 2015 edition of the NATO Advanced Study Institute “Marktoberdorf Summer School”, under the the title Verification and Synthesis of Correct and Secure Systems, offered its participants a comprehensive view of the state-of-the-art in this area.
In her tutorial Probabilistic Model Checking, Christel Baier introduced the principles of discrete-time, finite-state Markovian models and their quantitative analysis against temporal logic specifications. She presented the basic principles of the automatabased approach to model-checking probabilistic temporal logics, techniques for tackling the state-explosion problem, and algorithms for dealing with fairness assumptions and computing conditional probabilities and quantiles.
Nikolaj Bjørner lectured on algorithms for solving satisfiability problems modulo theories (SMT). In his tutorial SMT Solvers: Foundations and Applications he also introduced a large collection of application scenarios, including program verification, network analysis, symbolic model checking, test-case generation, and white-box fuzzing.
Security protocols should guarantee private communication over untrusted networks, but, due to mistakes, are often subject to attacks. In her lectures on Models and Techniques for Analyzing Security Protocols Véronique Cortier described and discussed decision techniques based on Formal Methods to automatically verify properties of security protocols like authentication or confidentiality.
In his lecture series on Parameterized Verification of Crowds of Anonymous Processes, Javier Esparza explained that systems composed of a finite but possibly arbitrary number of identical components occur everywhere, from hardware design to distributed applications. Parameterized verification is the task of verifying the correctness of this kind of systems regardless the number of their components. He analyzed the decidability and complexity of parameterized verification when all processes execute the same code and have no identities.
More than fifty years after its introduction by Alonzo Church, the synthesis problem is still one of the most intriguing challenges in the theory of reactive systems. The series of lectures by Bernd Finkbeiner on Synthesis of Reactive Systems presented an overview on the automata- and game-theoretic foundations, explored the spectrum of logics and algorithms for the synthesis of reactive systems, and discussed the ideas behind recent efficient approaches like bounded synthesis.
Over the last two decades, significant progress has been made on how to broaden the scope of model checking from finite-state abstractions to actual software implementations. In his lectures on Software Model Checking, Patrice Godefroid presented his work on combining model checking and testing techniques, and reported on their remarkable success at Microsoft.
Model checking is a technique for automating high-quality assurance of software. Despite its success, it still suffers from the state explosion problem, which refers to the worst-case exponential growth of the state space of a program with the number of concurrent components. Orna Grumberg's tutorial on Compositional Model Checking presented techniques to break-up the global verification of a program into a collection of more manageable, local verification problems about its individual components.
Sumit Gulwani lectured on Programming by Examples (PBE), a collection of techniques for synthesizing programs in an underlying domain-specific language from example based specifications. He explained how PBE has the potential to revolutionize enduser programming by enabling end users, most of them are non-programmers, to create scripts for automating repetitive tasks.
Developing reliable concurrent programs is a difficult and specialized task, requiring highly skilled engineers, most of whose efforts are spent on the testing and validation phases. As a result, there is a strong economic and strategic incentive for software companies to automate parts of the verification process. Daniel Kröning lectured on Verification of Concurrent Software, and described tools that automatically check that all possible behaviors of a concurrent program satisfy a given specification.
Two-player games can be used to mathematically describe non-terminating interactions between a system and its environment. In his tutorial Two-Player Zero Sum Games Played on Graphs: Omega Regular and Quantitative Objectives, Jean-François Raskin introduced algorithms and complexity results for the most relevant classes of two-player games, including reachability, Büchi, and parity games, as well as games with quantitative measures, like energy and mean-payoff games.
In his course on Software Security by Information Flow Control, David Sands lectured on security by design, the long-term research goal of developing methods for the design of secure systems that treat security requirements as first-class citizens. In the course he introduced his work on security and privacy policies for data manipulated by software.
Helmut Seidl presented a series of lectures on Equivalence – Combinatorics, Algebra, Proofs. He studied the problem of proving that two programs are equivalent, where the notion of equivalence is relativized w.r.t. the observable behavior of the code. He presented a collection of decidability and complexity results on different aspects of the problem.
The title of Eran Yahav's series of lectures was Analysis of Synthesis with ‘Big Code’. He considered semantic representations based on symbolic automata, tracelets, numerical abstractions, and textual descriptions, and notions of code similarity based on these representations. He presented a number of prediction techniques, including statistical language models, order Markov models, and other distance-based and model-based sequence classification techniques.
The 2015 Marktoberdorf Summer School of the Advance Study Institute was a forum for challenging, exciting, and intense learning, and for discussion and development of cutting-edge ideas. It was a very productive and beneficial event, at both the professional and social levels.
We thank all lecturers, the staff of the organization, and our hosts in Marktoberdorf. Special thanks go to Dr. Katharina Spies, Maximilian Irlbeck and Florian Schulz for their great support.
The Marktoberdorf Summer School was an event of the Advanced Study Institute of the NATO Science for Peace and Security Programme. We thank all authorities that made it possible. The Editors
Probabilistic model checking is a fully automated method for the quantitative system analysis against temporal logic specifications. This article provides an overview of the main model-checking concepts for finite-state discrete-time Markov chains.
Satisfiability Modulo Theories (SMT) solvers are used in many modern program verification, analysis and testing tools. They owe their scale and efficiency thanks to advances in search algorithms underlying modern SAT solvers and first-order theorem provers. They owe their versatility in software development applications thanks to specialized algorithms supporting theories, such as numbers and algebraic data-types, of relevance for software engineering. This lecture introduces algorithmic principles of SMT solving, taking as basis modern SAT solvers and integration with specialized theory solvers and quantifier reasoning. We detail some of the algorithms used for main theories used in current SMT solvers and survey newer theories and approaches to integrating solvers. The lectures also outline some application scenarios where SMT solvers have found use, including program verification, network analysis, symbolic model checking, test-case generation, and white-box fuzzing.
These lecture notes highlight the main algorithm used in modern SAT and SMT solvers, the SAT Conflict Directed Clause Learning algorithm, and casts it as process of searching for a model and building a refutation partial proof in tandem. This theme is common to many of the algorithms used in theory solvers.
Security protocols are distributed programs that aim at securing communications by the means of cryptography. They are for instance used to secure electronic payments, home banking and more recently electronic elections. Given the financial and societal impact in case of failure, and the long history of design flaws in such protocols, formal verification is a necessity. A major difference from other safety critical systems is that the properties of security protocols must hold in the presence of an arbitrary adversary. The aim of these notes is to provide a tutorial to some modern approaches for formally modeling protocols, their goals and automatically verifying them.
We survey some results on the automatic verification of parameterized programs without identities. These are systems composed of arbitrarily many components, all of them running exactly the same finite-state program. We discuss the complexity of deciding that no component reaches an unsafe state. A former version of this paper was published in the Proceedings of STACS 2014 [8].
These lecture notes trace the developments triggered by Church's classic synthesis problem from the early solutions in the 1960s to the practical tools that have come out in the past few years. The article gives an overview on the automata- and game-theoretic foundations of the synthesis problem. We then explore the spectrum of logics for the synthesis of reactive systems, from reduced logics, like GR(1), to advanced logics such as strategy and coordination logic. Finally, we discuss the ideas behind recent synthesis approaches, like bounded synthesis and symbolic and SAT-based methods.
Dynamic software model checking consists of adapting model checking into a form of systematic testing that is applicable to industrial-size software. Over the last two decades, dozens of tools following this paradigm have been developed for checking concurrent and data-driven software. Compared to traditional software testing, dynamic software model checking provides better coverage, but is more computationally expensive. Compared to more general forms of program verification like interactive theorem proving, this approach provides more limited verification guarantees, but is cheaper due to its higher level of automation. Dynamic software model checking thus offers an attractive practical trade-off between testing and formal verification. This paper presents a brief introduction to dynamic software model checking.
This work presents a novel approach for applying compositional model checking of behavioral UML models, based on learning. The Unified Modeling Language (UML) is a widely accepted modeling language for embedded and safety critical systems. As such the correct behavior of systems represented as UML models is crucial. Model checking is a successful automated verification technique for checking whether a system satisfies a desired property. However, its applicability is often impeded by its high time and memory requirements. A successful approach to tackle this limitation is compositional model checking. Recently, great advancements have been made in this direction via automatic learning-based Assume-Guarantee reasoning.
In this work we propose a framework for automatic Assume-Guarantee reasoning for behavioral UML systems. We apply an off-the-shelf learning algorithm for incrementally generating environment assumptions that guarantee satisfaction of the property. A unique feature of our approach is that the generated assumptions are UML state machines. Moreover, our Teacher works at the UML level: all queries from the learning algorithm are answered by generating and verifying behavioral UML systems.
Programming by Examples (PBE) has the potential to revolutionize end-user programming by enabling end users, most of whom are non-programmers, to create scripts for automating repetitive tasks. PBE involves synthesizing intended programs in an underlying domain-specific language (DSL) from example based specifications (Ispec). We formalize the notion of Ispec and discuss some principles behind designing useful DSLs for synthesis.
A key technical challenge in PBE is to search for programs that are consistent with the Ispec provided by the user. We present a divide-and-conquer based search paradigm that leverages deductive rules and version space algebras for manipulating sets of programs.
Another technical challenge in PBE is to resolve the ambiguity that is inherent in the Ispec. We show how machine learning based ranking techniques can be used to predict an intended program within a set of programs that are consistent with the Ispec. We also present some user interaction models including program navigation and active-learning based conversational clarification that communicate actionable information to the user to help resolve ambiguity in the Ispec.
The above-mentioned concepts are illustrated using practical PBE systems for data wrangling (including FlashFill, FlashExtract, FlashRelate), several of which have already been deployed in the real world.
We provide a tutorial on the verification of concurrent software. We first discuss semantics of modern shared-variable concurrent software. We then provide an overview of algorithmic methods for analysing such software. We begin with Bounded Model Checking, a technique that performs an analysis of program paths up to a user-specified length. We then discuss methods that are, in principle, able to provide an unbounded analysis. We discuss how to apply predicate abstraction to concurrent software and then present an extension of lazy abstraction with interpolants to such software.
In this tutorial, we first review the definitions of mean-payoff and energy games. We then present simple and self-contained proofs of their main properties. We also provide a pseudo-polynomial time algorithm to solve them. Pointers to more advanced materials in the literature are also provided to the reader.
One particular instance of reasoning about the semantic correctness of programs is to prove that two programs in fact are equivalent, where the notion of equivalence may be relativized w.r.t. the set of observations of interest to be made about the behavior of the code.
In general, proving equivalence is an intriguing problem. The tutorial at MOD 2015 gave a non-exhaustive overview over various techniques in the field. In a first lecture, the tutorial considered the most simple form of programs, namely, straight-line programs which are the basis of, e.g., grammar-based compression schemes.
The second and third lecture then considered document processors which transform structured input into (possibly unstructured) output. Even if these processors are finite-state, elaborate techniques from commutative algebra are required to obtain effective decision procedures.
Finally, the fourth lecture considered observational equivalence of programs in general. Based on a formulation as a hyper-safety property a framework was presented which allows to apply techniques from relational abstract interpretation to infer equivalences in non-trivial cases.
In order to provide a homogeneous presentation of original results, we concentrate in these lecture notes on formalisms where effective decision procedures can be provided, namely, on deterministic top-down tree transducers with and without structured output.
The vast amount of code available on the web is increasing on a daily basis. Open-source hosting sites such as GitHub contain billions of lines of code. Community question-answering sites provide millions of code snippets with corresponding text and metadata. The amount of code available in executable binaries is even greater. In this lecture series, I will cover recent research trends on leveraging such “big code” for program analysis, program synthesis and reverse engineering. We will consider a range of semantic representations based on symbolic automata [55,63], tracelets [28], numerical abstractions [61,58], and textual descriptions [82,1], as well as different notions of code similarity based on these representations. To leverage these semantic representations, we will consider a number of prediction techniques, including statistical language models [66,73], variable order Markov models [18], and other distance-based and model-based sequence classification techniques. Finally, we discuss applications of these techniques including semantic code search in both source code [55] and stripped binaries [28], code completion and reverse engineering [43].