Ebook: Handbook of Satisfiability
“Satisfiability (SAT) related topics have attracted researchers from various disciplines: logic, applied areas such as planning, scheduling, operations research and combinatorial optimization, but also theoretical issues on the theme of complexity and much more, they all are connected through SAT. My personal interest in SAT stems from actual solving: The increase in power of modern SAT solvers over the past 15 years has been phenomenal. It has become the key enabling technology in automated verification of both computer hardware and software. Bounded Model Checking (BMC) of computer hardware is now probably the most widely used model checking technique. The counterexamples that it finds are just satisfying instances of a Boolean formula obtained by unwinding to some fixed depth a sequential circuit and its specification in linear temporal logic. Extending model checking to software verification is a much more difficult problem on the frontier of current research. One promising approach for languages like C with finite word-length integers is to use the same idea as in BMC but with a decision procedure for the theory of bit-vectors instead of SAT. All decision procedures for bit-vectors that I am familiar with ultimately make use of a fast SAT solver to handle complex formulas. Decision procedures for more complicated theories, like linear real and integer arithmetic, are also used in program verification. Most of them use powerful SAT solvers in an essential way. Clearly, efficient SAT solving is a key technology for 21st century computer science. I expect this collection of papers on all theoretical and practical aspects of SAT solving will be extremely useful to both students and researchers and will lead to many further advances in the field.”--Edmund Clarke (FORE Systems University Professor of Computer Science and Professor of Electrical and Computer Engineering at Carnegie Mellon University, winner of the 2007 A.M. Turing Award)
Propositional logic has been recognized throughout the centuries as one of the corner stones of reasoning in philosophy and mathematics. During history, its formalization into Boolean algebra was gradually accompanied by the recognition that a wide range of combinatorial problems can be expressed as propositional satisfiability (SAT) problems. Because of these two roles, SAT has become a mature multi-faceted scientific discipline. Initiated by the work of Cook, who established its [Nscr ][Pscr ]-complete status in 1971, SAT has become a reference problem for an enormous variety of complexity statements.
Simultaneously, many real-world problems were formalized as SAT decision problems using different encoding techniques. This includes verification problems in hardware and software, planning, scheduling and combinatorial design.
Due to the potential practical implications, an intensive search from the early days of computing has been underway of how one could solve SAT problems in an automated fashion. In 1957, Allen Newell and Herb Simon introduced one of the first AI programs, the Logic Theory Machine, to prove propositional theorems from Whitehead and Russell's “Principia mathematica”. Shortly after in 1960, Martin Davis and Hillary Putnam introduced their now famous decision procedure for propositional satisfiability problems (a more space efficient version due to Martin Davis, George Logemann and Donald Loveland followed in 1962).
Asked by the editors of this handbook as a witness of his role in this development Martin Davis wrote
“I felt delighted and honored when I was asked to write a foreword to this Handbook. But when I examined the table of contents and especially, when I read the initial historical article, I concluded that it is an honor that I hardly deserve. When Hilary Putnam and I began our work on automated theorem proving half a century ago, we mainly thought of it as a way to get funding for work on our real passion, Hilbert's tenth problem. We hardly imagined that SAT might have an independent importance aside from its role in automated deduction. It is wonderful to see what a flourishing area of research satisfiability has become.
Of course what is regarded as the most important problem in theoretical computer science,
The topics of the handbook span practical and theoretical research on SAT and its applications and include search algorithms, heuristics, analysis of algorithms, hard instances, randomized formulae, problem encodings, industrial applications, solvers, simplifiers, tools, case studies and empirical results. SAT is interpreted in a rather broad sense. Besides propositional satisfiability it includes the domain of quantified Boolean formulae (QBF), constraints programming techniques (CSP) for word-level problems and their propositional encoding and particularly satisfiability modulo theories (SMT).
The handbook aims to capture the full breadth and depth of SAT and to bundle significant progress and advances in automated solving. It covers the main notions and techniques and introduces various formal extensions. Each area is dealt with in a survey-like style, where some details may be neglected in favor of coverage. The extensive bibliography concluding each chapter will help the interested reader to find his way to master necessary details.
The intended audience of the handbook consists of researchers, graduate students, upper-year undergraduates, and practitioners who wish to learn about the state of the art in actual solving. Limited prior knowledge about the field is assumed. The handbook also aims to assist researchers from other fields in applying ideas and methods to their own work. We thus hope the handbook will provide a means for cross fertilization.
The start and completion of the underlying project would not have been possible without the support and encouragement of many people. As such, this handbook is a community effort and we take this opportunity to express our gratitude to this community as a whole, without addressing to all individual contributions. We are grateful to IOS Press Publishing Company because of their interest in materializing all these efforts into the book you are holding now. We are indebted to Martin Davis for his contribution to this preface and to Edmund Clarke for his support and recommendations which you may find at the back cover of this copy.
Along with the completion of this handbook another similar project was started by Teofilo Gonzalez. The handbook on [Nscr ][Pscr ]-Completeness, Theory and Applications, will be finished shortly and could be considered as a very welcome addition to those concepts which touch propositional proof complexity. These typically proof-system related issues are certainly not fully covered by us. And also there, various practical contributions on SAT solving will find updates which continue to be relevant as the discipline moves forward.
Finally we take the opportunity to spend a few words on a particularly inspiring event who took place during the last SAT conferences: The SAT Competition. Started at the Cincinnati conference in 2002, Daniel Le Berre and Laurent Simon put an incredible amount of work in establishing the format of rules, benchmarks and solver-performance evaluations which made SAT a competitive area where especially young researchers feel triggered to contribute. As a consequence, one of the most frequently used references in this handbook undoubtedly is their competition web page.
Arjen van Lith designed the cover of our Handbook. It is a challenge to recognize various SAT research related patterns hidden there.
Armin Biere, Marijn Heule, Hans van Maaren, Toby Walsh
This chapter traces the links between the notion of Satisfiability and the attempts by mathematicians, philosophers, engineers, and scientists over the last 2300 years to develop effective processes for emulating human reasoning and scientific discovery, and for assisting in the development of electronic computers and other electronic components. Satisfiability was present implicitly in the development of ancient logics such as Aristotle's syllogistic logic, its extentions by the Stoics, and Lull's diagrammatic logic of the medieval period. From the renaissance to Boole algebraic approaches to effective process replaced the logics of the ancients and all but enunciated the meaning of Satisfiability for propositional logic. Clarification of the concept is credited to Tarski in working out necessary and sufficient conditions for “p is true” for any formula p in first-order syntax. At about the same time, the study of effective process increased in importance with the resulting development of lambda calculus, recursive function theory, and Turing machines, all of which became the foundations of computer science and are linked to the notion of Satisfiability. Shannon provided the link to the computer age and Davis and Putnam directly linked Satisfiability to automated reasoning via an algorithm which is the backbone of most modern SAT solvers. These events propelled the study of Satisfiability for the next several decades, reaching “epidemic proportions” in the 1990s and 2000s, and the chapter concludes with a brief history of each of the major Satisfiability-related research tracks that developed during that period.
Before a combinatorial problem can be solved by current SAT methods, it must usually be encoded in conjunctive normal form, which facilitates algorithm implementation and allows a common file format for problems. Unfortunately there are several ways of encoding most problems and few guidelines on how to choose among them, yet the choice of encoding can be as important as the choice of search algorithm. This chapter reviews theoretical and empirical work on encoding methods, including the use of Tseitin encodings, the encoding of extensional and intensional constraints, the interaction between encodings and search algorithms, and some common sources of error. Case studies are used for illustration.
Complete SAT algorithms form an important part of the SAT literature. From a theoretical perspective, complete algorithms can be used as tools for studying the complexities of different proof systems. From a practical point of view, these algorithms form the basis for tackling SAT problems arising from real-world applications. The practicality of modern, complete SAT solvers undoubtedly contributes to the growing interest in the class of complete SAT algorithms. We review these algorithms in this chapter, including Davis-Putnum resolution, Stalmarck's algorithm, symbolic SAT solving, the DPLL algorithm, and modern clause-learning SAT solvers. We also discuss the issue of certifying the answers of modern complete SAT solvers.
One of the most important paradigm shifts in the use of SAT solvers for solving industrial problems has been the introduction of clause learning. Clause learning entails adding a new clause for each conflict during backtrack search. This new clause prevents the same conflict from occurring again during the search process. Moreover, sophisticated techniques such as the identification of unique implication points in a graph of implications, allow creating clauses that more precisely identify the assignments responsible for conflicts. Learned clauses often have a large number of literals. As a result, another paradigm shift has been the development of new data structures, namely lazy data structures, which are particularly effective at handling large clauses. These data structures are called lazy due to being in general unable to provide the actual status of a clause. Efficiency concerns and the use of lazy data structures motivated the introduction of dynamic heuristics that do not require knowing the precise status of clauses. This chapter describes the ingredients of conflict-driven clause learning SAT solvers, namely conflict analysis, lazy data structures, search restarts, conflict-driven heuristics and clause deletion strategies.
The chapter on look-a-head architecture based solvers provides a state of the art description of how heuristics, data structures and learning in this context have evolved over the past year.
Contributions on the results of various scientific teams working with this architecture are described and unified. It also provides insight on its weakness when applied to a certain type of problems which are not appropriate to solve using this architecture. It aims to describe the complementary role of this architecture and that of conflict driven solving mechanisms.
Research on incomplete algorithms for satisfiability testing lead to some of the first scalable SAT solvers in the early 1990's. Unlike systematic solvers often based on an exhaustive branching and backtracking search, incomplete methods are generally based on stochastic local search. On problems from a variety of domains, such incomplete methods for SAT can significantly outperform DPLL-based methods. While the early greedy algorithms already showed promise, especially on random instances, the introduction of randomization and so-called uphill moves during the search significantly extended the reach of incomplete algorithms for SAT. This chapter discusses such algorithms, along with a few key techniques that helped boost their performance such as focusing on variables appearing in currently unsatisfied clauses, devising methods to efficiently pull the search out of local minima through clause re-weighting, and adaptive noise mechanisms. The chapter also briefly discusses a formal foundation for some of the techniques based on the discrete Lagrangian method.
“Search trees”, “branching trees”, “backtracking trees” or “enumeration trees” are at the heart of many (complete) approaches towards hard combinatorial problems, constraint problems, and, of course, SAT problems. Given many choices for branching, the fundamental question is how to guide the choices so that the resulting trees are (relatively) small. Despite (or perhaps because) of its apparently more narrow scope, especially in the SAT area several approaches from theory and applications have found together, and the rudiments of a theory of branching heuristics emerged. In this chapter the first systematic treatment is given.
So a general theory of heuristics guiding the construction of “branching trees” is developed, ranging from a general theoretical analysis to the analysis of the historical development of branching heuristics for SAT solvers, and also to heuristics beyond SAT solving.
In the last twenty years a significant amount of effort has been devoted to the study of randomly generated satisfiability instances. While a number of generative models have been proposed, uniformly random k-CNF formulas are by now the dominant and most studied model. One reason for this is that such formulas enjoy a number of intriguing mathematical properties, including the following: for each k≥3, there is a critical value, rk, of the clauses-to-variables ratio, r, such that for r<rk a random k-CNF formula is satisfiable with probability that tends to 1 as n→∞, while for r>rk it is unsatisfiable with probability that tends to 1 as n→∞.
Algorithmically, even at densities much below rk, no polynomial-time algorithm is known that can find any solution even with constant probability, while for all densities greater than rk, the length of every resolution proof of unsatisfiability is exponential (and, thus, so is the running time of every DPLL-type algorithm). By now, the study of random k-CNF formulas has also attracted attention in areas such as mathematics and statistical physics and is at the center of an area of intense research activity. At the same time, random k-SAT instances are a popular benchmark for testing and tuning satisfiability algorithms. Indeed, some of the better practical ideas in use today come from insights gained by studying the performance of algorithms on them. We review old and recent mathematical results about random k-CNF formulas, demonstrating that the connection between computational complexity and phase transitions is both deep and highly nuanced.
It has become well know over time that the performance of backtrack-style complete SAT solvers can vary dramatically depending on “little” details of the heuristics used, such as the way one selects the next variable to branch on and in what order the possible values are assigned to the variable. Extreme variations can result even from simple tie breaking mechanisms necessarily employed in all SAT solvers. The discovery of this extreme runtime variation has been both a stumbling block and an opportunity. This chapter focuses on providing an understanding of this intriguing phenomenon, particularly in terms of the so-called heavy tailed nature of the runtime distributions of systematic SAT solvers. It describes a simple formal model based on expensive mistakes to explain runtime distributions seen in practice, and discusses randomization and restart strategies that can be used to effectively overcome the negative impact of heavy tailed behavior. Finally, the chapter discusses the notion of backdoor variables, which explain the unexpectedly short runs one also often sees in practice.
Symmetry is at once a familiar concept (we recognize it when we see it!) and a profoundly deep mathematical subject. At its most basic, a symmetry is some transformation of an object that leaves the object (or some aspect of the object) unchanged. For example, a square can be transformed in eight different ways that leave it looking exactly the same: the identity “do-nothing” transformation, 3 rotations, and 4 mirror images (or reflections). In the context of decision problems, the presence of symmetries in a problem's search space can frustrate the hunt for a solution by forcing a search algorithm to fruitlessly explore symmetric subspaces that do not contain solutions. Recognizing that such symmetries exist, we can direct a search algorithm to look for solutions only in non-symmetric parts of the search space. In many cases, this can lead to significant pruning of the search space and yield solutions to problems which are otherwise intractable.
This chapter explores the symmetries of Boolean functions, particularly the symmetries of their conjunctive normal form (CNF) representations. Specifically, it examines what those symmetries are, how to model them using the mathematical language of group theory, how to derive them from a CNF formula, and how to utilize them to speed up CNF SAT solvers.
Minimal unsatisfiability describes the reduced kernel of unsatisfiable formulas. The investigation of this property is very helpful in understanding the reasons for unsatisfiability as well as the behaviour of SAT-solvers and proof calculi. Moreover, for propositional formulas and quantified Boolean formulas the computational complexity of various SAT-related problems are strongly related to the complexity of minimal unsatisfiable formulas.
While “minimal unsatisfiability” studies the structure of problem instances without redundancies, the study of “autarkies” considers the redundancies themselves, in various guises related to partial assignments which satisfy some part of the problem instance while leaving the rest “untouched”. As it turns out, autarky theory creates many bridges to combinatorics, algebra and logic, and the second part of this chapter provides a solid foundation of the basic ideas and results of autarky theory: the basic algorithmic problems, the algebra involved, and relations to various combinatorial theories (e.g., matching theory, linear programming, graph theory, the theory of permanents). Also the general theory of autarkies as a kind of combinatorial “meta theory” is sketched (regarding its basic notions).
The chapter is a survey of ideas and techniques behind satisfiability algorithms with the currently best asymptotic upper bounds on the worst-case running time. The survey also includes related structural-complexity topics such as Schaefer's dichotomy theorem, reductions between various restricted cases of SAT, the exponential time hypothesis, etc.
Parameterized complexity is a new theoretical framework that considers, in addition to the overall input size, the effects on computational complexity of a secondary measurement, the parameter. This two-dimensional viewpoint allows a fine-grained complexity analysis that takes structural properties of problem instances into account. The central notion is “fixed-parameter tractability” which refers to solvability in polynomial time for each fixed value of the parameter such that the order of the polynomial time bound is independent of the parameter. This chapter presents main concepts and recent results on the parameterized complexity of the satisfiability problem and it outlines fundamental algorithmic ideas that arise in this context. Among the parameters considered are the size of backdoor sets with respect to various tractable base classes and the treewidth of graph representations of satisfiability instances.
One of the most important industrial applications of SAT is currently Bounded Model Checking (BMC). This technique is typically used for formal hardware verification in the context of Electronic Design Automation. But BMC has successfully been applied to many other domains as well. In practice, BMC is mainly used for falsification, which is concerned with violations of temporal properties. In addition, a considerable part of this chapter discusses complete extensions, including k-induction and interpolation. These extensions also allow to prove properties.
The planning problem in Artificial Intelligence was the first application of SAT to reasoning about transition systems and a direct precursor to the use of SAT in a number of other applications, including bounded model-checking in computer-aided verification.
This chapter presents the main ideas about encoding goal reachability problems as a SAT problem, including parallel plans and different forms of constraints for speeding up SAT solving, as well as algorithms for solving the AI planning problem with a SAT solver. Finally, more general planning problems that require the use of QBF or other generalizations of SAT are discussed.
This chapter covers an application of propositional satisfiability to program analysis. We focus on the discovery of programming flaws in low-level programs, such as embedded software. The loops in the program are unwound together with a property to form a formula, which is then converted into CNF. The method supports low-level programming constructs such as bit-wise operators or pointer arithmetic.
The theory of combinatorial designs has always been a rich source of structured, parametrized families of SAT instances. On one hand, design theory provides interesting problems for testing various SAT solvers; on the other hand, high-performance SAT solvers provide an alternative tool for attacking open problems in design theory, simply by encoding problems as propositional formulas, and then searching for their models using off-the-shelf general purpose SAT solvers. This chapter presents several case studies of using SAT solvers to solve hard design theory problems, including quasigroup problems, Ramsey numbers, Van der Waerden numbers, covering arrays, Steiner systems, and Mendelsohn designs. It is shown that over a hundred of previously-open design theory problems were solved by SAT solvers, thus demonstrating the significant power of modern SAT solvers. Moreover, the chapter provides a list of 30 open design theory problems for the developers of SAT solvers to test their new ideas and weapons.
This chapter surveys a part of the intense research activity that has been devoted by theoretical physicists to the study of randomly generated k-SAT instances. It can be at first sight surprising that there is a connection between physics and computer science. However low-temperature statistical mechanics concerns precisely the behaviour of the low-lying configurations of an energy landscape, in other words the optimization of a cost function. Moreover the ensemble of random k-SAT instances exhibit phase transitions, a phenomenon mostly studied in physics (think for instance at the transition between liquid and gaseous water). Besides the introduction of general concepts of statistical mechanics and their translations in computer science language, the chapter presents results on the location of the satisfiability transition, the detailed picture of the satisfiable regime and the various phase transitions it undergoes, and algorithmic issues for random k-SAT instances.
MaxSAT solving is becoming a competitive generic approach for solving combinatorial optimization problems, partly due to the development of new solving techniques that have been recently incorporated into modern MaxSAT solvers, and to the challenge problems posed at the MaxSAT Evaluations. In this chapter we present the most relevant results on both approximate and exact MaxSAT solving, and survey in more detail the techniques that have proven to be useful in branch and bound MaxSAT and Weighted MaxSAT solvers. Among such techniques, we pay special attention to the definition of good quality lower bounds, powerful inference rules, clever variable selection heuristics and suitable data structures. Moreover, we discuss the advantages of dealing with hard and soft constraints in the Partial MaxSAT formalims, and present a summary of the MaxSAT Evaluations that have been organized so far as affiliated events of the International Conference on Theory and Applications of Satisfiability Testing.
Model counting, or counting the number of solutions of a propositional formula, generalizes SAT and is the canonical #P-complete problem. Surprisingly, model counting is hard even for some polynomial-time solvable cases like 2-SAT and Horn-SAT. Efficient algorithms for this problem will have a significant impact on many application areas that are inherently beyond SAT, such as bounded-length adversarial and contingency planning, and, perhaps most importantly, general probabilistic inference. Model counting can be solved, in principle and to an extent in practice, by extending the two most successful frameworks for SAT algorithms, namely, DPLL and local search. However, scalability and accuracy pose a substantial challenge. As a result, several new ideas have been introduced in the last few years that go beyond the techniques usually employed in most SAT solvers. These include division into components, caching, compilation into normal forms, exploitation of solution sampling methods, and certain randomized streamlining techniques using special constraints. This chapter discusses these techniques, exploring both exact methods as well as fast estimation approaches, including those that provide probabilistic or statistical guarantees on the quality of the reported lower or upper bound on the model count.
When studying the propositional satisfiability problem (SAT), that is, the problem of deciding whether a propositional formula is satisfiable, it is typically assumed that the formula is given in the conjunctive normal form (CNF). Also most software tools for deciding satisfiability of a formula (SAT solvers) assume that their input is in CNF. An important reason for this is that it is simpler to develop efficient data structures and algorithms for CNF than for arbitrary formulas. On the other hand, using CNF makes efficient modeling of an application cumbersome. Therefore one often employs a more general formula representation in modeling and then transforms the formula into CNF for SAT solvers. Transforming a propositional formula in CNF either increases the formula size exponentially or requires the use of auxiliary variables, which can have an negative effect on the performance of a SAT solver in the worst-case. Moreover, by translating to CNF one often loses information about the structure of the original problem. In this chapter we survey methods for solving propositional satisfiability problems when the input formula is not given in CNF but as a general formula or even more compactly as a Boolean circuit. We show how the techniques applied in CNF level Davis-Putnam-Loveland-Logemann algorithm generalize to Boolean circuits and how the problem structure available in the circuit form can be exploited. Then we consider a closely related area of automatic test pattern generation (ATPG) for digital circuits and review classical ATPG algorithms, formulation of ATPG as a SAT problem, and advanced techniques for SAT-based ATPG.
Pseudo-Boolean and cardinality constraints are a natural generalization of clauses. While a clause expresses that at least one literal must be true, a cardinality constraint expresses that at least n literals must be true and a pseudo-Boolean constraint states that a weighted sum of literals must be greater than a constant. These contraints have a high expressive power, have been intensively studied in 0/1 programming and are close enough to the satisfiability problem to benefit from the recents advances in this field. Besides, optimization problems are naturally expressed in the pseudo-Boolean context.
This chapter presents the inference rules on pseudo-Boolean constraints and demonstrates their increased inference power in comparison with resolution. It also shows how the modern satisfiability algorithms can be extended to deal with pseudo-Boolean constraints.