Ebook: Handbook of Satisfiability
Propositional logic has been recognized throughout the centuries as one of the cornerstones of reasoning in philosophy and mathematics. Over time, its formalization into Boolean algebra was accompanied by the recognition that a wide range of combinatorial problems can be expressed as propositional satisfiability (SAT) problems. Because of this dual role, SAT developed into a mature, multi-faceted scientific discipline, and from the earliest days of computing a search was underway to discover how to solve SAT problems in an automated fashion.
This book, the Handbook of Satisfiability, is the second, updated and revised edition of the book first published in 2009 under the same name. The handbook aims to capture the full breadth and depth of SAT and to bring together significant progress and advances in automated solving. Topics covered span practical and theoretical research on SAT and its applications and include search algorithms, heuristics, analysis of algorithms, hard instances, randomized formulae, problem encodings, industrial applications, solvers, simplifiers, tools, case studies and empirical results. SAT is interpreted in a broad sense, so as well as propositional satisfiability, there are chapters covering the domain of quantified Boolean formulae (QBF), constraints programming techniques (CSP) for word-level problems and their propositional encoding, and satisfiability modulo theories (SMT). An extensive bibliography completes each chapter
This second edition of the handbook will be of interest to researchers, graduate students, final-year undergraduates, and practitioners using or contributing to SAT, and will provide both an inspiration and a rich resource for their work.
Edmund Clarke, 2007 ACM Turing Award Recipient:
"SAT solving is a key technology for 21st century computer science."
Donald Knuth, 1974 ACM Turing Award Recipient:
"SAT is evidently a killer app, because it is key to the solution of so many other problems."
Stephen Cook, 1982 ACM Turing Award Recipient:
"The SAT problem is at the core of arguably the most fundamental question in computer science: What makes a problem hard?"
When the first edition of this handbook was published in 2009 advances in SAT were mostly only known to experts, and SAT was seen as a key technology only in certain applications. Since then the number of practical applications has exploded, along with greater awareness of the usefulness of SAT in general: In the last century showing that a certain problem is as hard as SAT was the end of the story and trying to solve it directly seemed to be hopeless. Relying on the continuing improvements made in practical SAT solving, it is now widely accepted that being able to encode a problem into SAT also is highly likely to lead to a practical solution. This “SAT Revolution” started at the end of the last century and continues to produce amazing new practical and theoretical results.
Accordingly, this second edition contains several updates, including a completely revamped Chapter 4 on conflict-driven clause learning. Half the chapters were updated or extended or both. Comments by Donald Knuth were also taken into account, collected while he was preparing the Section on Satisfiability in volume 4 of the “The Art of Computer Programming”. This section appeared as fascicle 6a with more than 300 pages in 2015 and is a major milestone in the SAT literature that has appeared since the first edition of this Handbook.
Three important topics, which were already in the first edition of the handbook and deserve their own chapter, are now given enough space and discussed in detail: there is the new Chapter 7 on proof complexity, Chapter 9 on preprocessing, as well as the new Chapter 12 on automated configuration and solver selection. Additionally, the new Chapter 15 covers proofs of unsatisfiability, one of the main recent developments in practical SAT solving. These proofs are essential in solving long-standing mathematical problems.
Beside these four completely new chapters, there are three new chapters which cover topics already discussed in the first edition. These three chapters focus on new aspects and new technology and are written by authors who made fundamental contributions to change the state of the art. First, recent developments regarding quantified Boolean formulas including the discussion of proof systems are covered in the new Chapter 31. Second, the focus of the new Chapter 24 is on core-based methods for maximum satisfiability which have improved scalability considerably. Finally, the last new Chapter 26 covers a novel research direction on practical and approximate model counting with strong statistical guarantees.
Research in SAT has established itself as a vibrant cross community effort. Beside the main SAT conference, other major conferences and journals in diverse fields from automated reasoning, verification, hardware to software engineering, complexity theory, even algorithms and of course artificial intelligence cover SAT and its extensions areas of interest. It is also worth mentioning that competitions continue to serve as a show case as well as motivation of the field.
One could argue that SAT is now very widespread across these fields. As one example, and maybe most striking, is the adoption of SAT in other reasoning disciplines, starting of course with SMT solving, the use of lazy clause generation in constraint programming, as well as SAT based splitting in first-order theorem proving, and finally in the use of hammers in interactive theorem proving for higher-order logic relying directly or indirectly on SAT.
As the first edition appears to have done, we hope that also this second edition of the Handbook will serve researchers and practitioners using or contributing to SAT, and provide both an inspiration and a rich resource for their own work.
Armin Biere
Marijn Heule
Hans van Maaren
Toby Walsh
This chapter traces the links between the notion of Satisfiability and the attempts by mathematicians, philosophers, engineers, and scientists over the last 2300 years to develop effective processes for emulating human reasoning and scientific discovery, and for assisting in the development of electronic computers and other electronic components. Satisfiability was present implicitly in the development of ancient logics such as Aristotle’s syllogistic logic, its extentions by the Stoics, and Lull’s diagrammatic logic of the medieval period. From the renaissance to Boole algebraic approaches to effective process replaced the logics of the ancients and all but enunciated the meaning of Satisfiability for propositional logic. Clarification of the concept is credited to Tarski in working out necessary and sufficient conditions for “p is true” for any formula p in first-order syntax. At about the same time, the study of effective process increased in importance with the resulting development of lambda calculus, recursive function theory, and Turing machines, all of which became the foundations of computer science and are linked to the notion of Satisfiability. Shannon provided the link to the computer age and Davis and Putnam directly linked Satisfiability to automated reasoning via an algorithm which is the backbone of most modern SAT solvers. These events propelled the study of Satisfiability for the next several decades, reaching “epidemic proportions” in the 1990s and 2000s, and the chapter concludes with a brief history of each of the major Satisfiability-related research tracks that developed during that period.
Before a combinatorial problem can be solved by current SAT methods, it must usually be encoded in conjunctive normal form, which facilitates algorithm implementation and allows a common file format for problems. Unfortunately there are several ways of encoding most problems and few guidelines on how to choose among them, yet the choice of encoding can be as important as the choice of search algorithm. This chapter reviews theoretical and empirical work on encoding methods, including the use of Tseitin encodings, the encoding of extensional and intensional constraints, the interaction between encodings and search algorithms, and some common sources of error. Case studies are used for illustration.
Complete SAT algorithms form an important part of the SAT literature. From a theoretical perspective, complete algorithms can be used as tools for studying the complexities of different proof systems. From a practical point of view, these algorithms form the basis for tackling SAT problems arising from real-world applications. The practicality of modern, complete SAT solvers undoubtedly contributes to the growing interest in the class of complete SAT algorithms. We review these algorithms in this chapter, including Davis-Putnum resolution, Stalmarck’s algorithm, symbolic SAT solving, the DPLL algorithm, and modern clause-learning SAT solvers. We also discuss the issue of certifying the answers of modern complete SAT solvers.
One of the most important paradigm shifts in the use of SAT solvers for solving industrial problems has been the introduction of clause learning. Clause learning entails adding a new clause for each conflict during backtrack search. This new clause prevents the same conflict from occurring again during the search process. Moreover, sophisticated techniques such as the identification of unique implication points in a graph of implications, allow creating clauses that more precisely identify the assignments responsible for conflicts. Learned clauses often have a large number of literals. As a result, another paradigm shift has been the development of new data structures, namely lazy data structures, which are particularly effective at handling large clauses. These data structures are called lazy due to being in general unable to provide the actual status of a clause. Efficiency concerns and the use of lazy data structures motivated the introduction of dynamic heuristics that do not require knowing the precise status of clauses. This chapter describes the ingredients of conflict-driven clause learning SAT solvers, namely conflict analysis, lazy data structures, search restarts, conflict-driven heuristics and clause deletion strategies.
The chapter on look-a-head architecture based solvers provides a state of the art description of how heuristics, data structures and learning in this context have evolved over the past year. Contributions on the results of various scientific teams working with this architecture are described and unified. It also provides insight on its weakness when applied to a certain type of problems which are not appropriate to solve using this architecture. It aims to describe the complementary role of this architecture and that of conflict driven solving mechanisms.
Research on incomplete algorithms for satisfiability testing lead to some of the first scalable SAT solvers in the early 1990’s. Unlike systematic solvers often based on an exhaustive branching and backtracking search, incomplete methods are generally based on stochastic local search. On problems from a variety of domains, such incomplete methods for SAT can significantly outperform DPLL-based methods. While the early greedy algorithms already showed promise, especially on random instances, the introduction of randomization and so-called uphill moves during the search significantly extended the reach of incomplete algorithms for SAT. This chapter discusses such algorithms, along with a few key techniques that helped boost their performance such as focusing on variables appearing in currently unsatisfied clauses, devising methods to efficiently pull the search out of local minima through clause re-weighting, and adaptive noise mechanisms. The chapter also briefly discusses a formal foundation for some of the techniques based on the discrete Lagrangian method.
This chapter gives an overview of proof complexity and connections to SAT solving, focusing on proof systems such as resolution, Nullstellensatz, polynomial calculus, and cutting planes (corresponding to conflict-driven clause learning, algebraic approaches using linear algebra or Gröbner bases, and pseudo-Boolean solving, respectively). There is also a discussion of extended resolution (which is closely related to DRAT proof logging) and Frege and extended Frege systems more generally. An ample supply of references for further reading is provided, including for some topics omitted in this chapter.
“Search trees”, “branching trees”, “backtracking trees” or “enumeration trees” are at the heart of many (complete) approaches towards hard combinatorial problems, constraint problems, and, of course, SAT problems. Given many choices for branching, the fundamental question is how to guide the choices so that the resulting trees are (relatively) small. Despite (or perhaps because) of its apparently more narrow scope, especially in the SAT area several approaches from theory and applications have found together, and the rudiments of a theory of branching heuristics emerged. In this chapter the first systematic treatment is given. So a general theory of heuristics guiding the construction of “branching trees” is developed, ranging from a general theoretical analysis to the analysis of the historical development of branching heuristics for SAT solvers, and also to heuristics beyond SAT solving.
Preprocessing has become a key component of the Boolean satisfiability (SAT) solving workflow. In practice, preprocessing is situated between the encoding phase and the solving phase, with the aim of decreasing the total solving time by applying efficient simplification techniques on SAT instances to speed up the search subsequently performed by a SAT solver. In this chapter, we overview key preprocessing techniques proposed in the literature. While the main focus is on techniques applicable to formulas in conjunctive normal form (CNF), we also selectively cover main ideas for preprocessing structural and higher-level SAT instance representations.
In the last twenty years a significant amount of effort has been devoted to the study of randomly generated satisfiability instances. While a number of generative models have been proposed, uniformly random k-CNF formulas are by now the dominant and most studied model. One reason for this is that such formulas enjoy a number of intriguing mathematical properties, including the following: for each k≥3, there is a critical value, rk, of the clauses-to-variables ratio, r, such that for r<rk a random k-CNF formula is satisfiable with probability that tends to 1 as n→∞, while for r>rk it is unsatisfiable with probability that tends to 1 as n→∞. Algorithmically, even at densities much below rk, no polynomial-time algorithm is known that can find any solution even with constant probability, while for all densities greater than rk, the length of every resolution proof of unsatisfiability is exponential (and, thus, so is the running time of every DPLL-type algorithm). By now, the study of random k-CNF formulas has also attracted attention in areas such as mathematics and statistical physics and is at the center of an area of intense research activity. At the same time, random k-SAT instances are a popular benchmark for testing and tuning satisfiability algorithms. Indeed, some of the better practical ideas in use today come from insights gained by studying the performance of algorithms on them. We review old and recent mathematical results about random k-CNF formulas, demonstrating that the connection between computational complexity and phase transitions is both deep and highly nuanced.
It has become well know over time that the performance of backtrack-style complete SAT solvers can vary dramatically depending on “little” details of the heuristics used, such as the way one selects the next variable to branch on and in what order the possible values are assigned to the variable. Extreme variations can result even from simple tie breaking mechanisms necessarily employed in all SAT solvers. The discovery of this extreme runtime variation has been both a stumbling block and an opportunity. This chapter focuses on providing an understanding of this intriguing phenomenon, particularly in terms of the so-called heavy tailed nature of the runtime distributions of systematic SAT solvers. It describes a simple formal model based on expensive mistakes to explain runtime distributions seen in practice, and discusses randomization and restart strategies that can be used to effectively overcome the negative impact of heavy tailed behavior. Finally, the chapter discusses the notion of backdoor variables, which explain the unexpectedly short runs one also often sees in practice.
This chapter provides an introduction to the automated configuration and selection of SAT algorithms and gives an overview of the most prominent approaches. Since the early 2000s, these so-called meta-algorithmic approaches have played a major role in advancing the state of the art in SAT solving, giving rise to new ways of using and evaluating SAT solvers. At the same time, SAT has proven to be particularly fertile ground for research and development in the area of automated configuration and selection, and methods developed there have meanwhile achieved impact far beyond SAT, across a broad range of computationally challenging problems. Conceptually more complex approaches that go beyond “pure” algorithm configuration and selection are also discussed, along with some open challenges related to meta-algorithmic approaches, such as automated algorithm configuration and selection, to the tools based on these approaches, and to their effective application.
Symmetry is at once a familiar concept (we recognize it when we see it!) and a profoundly deep mathematical subject. At its most basic, a symmetry is some transformation of an object that leaves the object (or some aspect of the object) unchanged. For example, a square can be transformed in eight different ways that leave it looking exactly the same: the identity “do-nothing” transformation, 3 rotations, and 4 mirror images (or reflections). In the context of decision problems, the presence of symmetries in a problem’s search space can frustrate the hunt for a solution by forcing a search algorithm to fruitlessly explore symmetric subspaces that do not contain solutions. Recognizing that such symmetries exist, we can direct a search algorithm to look for solutions only in non-symmetric parts of the search space. In many cases, this can lead to significant pruning of the search space and yield solutions to problems which are otherwise intractable. This chapter explores the symmetries of Boolean functions, particularly the symmetries of their conjunctive normal form (CNF) representations. Specifically, it examines what those symmetries are, how to model them using the mathematical language of group theory, how to derive them from a CNF formula, and how to utilize them to speed up CNF SAT solvers.
Minimal unsatisfiability describes the reduced kernel of unsatisfiable formulas. The investigation of this property is very helpful in understanding the reasons for unsatisfiability as well as the behaviour of SAT-solvers and proof calculi. Moreover, for propositional formulas and quantified Boolean formulas the computational complexity of various SAT-related problems are strongly related to the complexity of minimal unsatisfiable formulas. While “minimal unsatisfiability” studies the structure of problem instances without redundancies, the study of “autarkies” considers the redundancies themselves, in various guises related to partial assignments which satisfy some part of the problem instance while leaving the rest “untouched”. As it turns out, autarky theory creates many bridges to combinatorics, algebra and logic, and the second part of this chapter provides a solid foundation of the basic ideas and results of autarky theory: the basic algorithmic problems, the algebra involved, and relations to various combinatorial theories (e.g., matching theory, linear programming, graph theory, the theory of permanents). Also the general theory of autarkies as a kind of combinatorial “meta theory” is sketched (regarding its basic notions).
Satisfiability (SAT) solvers have become complex tools, which raises the question of whether we can trust their results. This question is particularly important when the solvers are used to determine the correctness of hardware and software and when they are used to produce mathematical results. To deal with this issue, solvers can provide proofs of unsatisfiability to certify the correctness of their answers. This chapter presents the history and state-of-the-art of producing and validating proofs of unsatisfiability. The chapter covers the most popular proof formats with and without hints to speed up certification. Hints in proofs make validation easy, which resulted in several efficient formally-verified checkers. Various proof systems are discussed, ranging from resolution to the recent propagation redundancy system. The chapter also describes techniques to compress and optimize proofs.
The chapter is a survey of ideas and techniques behind satisfiability algorithms with the currently best asymptotic upper bounds on the worst-case running time. The survey also includes related structural-complexity topics such as Schaefer’s dichotomy theorem, reductions between various restricted cases of SAT, the exponential time hypothesis, etc.
Parameterized complexity is a new theoretical framework that considers, in addition to the overall input size, the effects on computational complexity of a secondary measurement, the parameter. This two-dimensional viewpoint allows a fine-grained complexity analysis that takes structural properties of problem instances into account. The central notion is “fixed-parameter tractability” which refers to solvability in polynomial time for each fixed value of the parameter such that the order of the polynomial time bound is independent of the parameter. This chapter presents main concepts and recent results on the parameterized complexity of the satisfiability problem and it outlines fundamental algorithmic ideas that arise in this context. Among the parameters considered are the size of backdoor sets with respect to various tractable base classes and the treewidth of graph representations of satisfiability instances.
One of the most important industrial applications of SAT is currently Bounded Model Checking (BMC). This technique is typically used for formal hardware verification in the context of Electronic Design Automation. But BMC has successfully been applied to many other domains as well. In practice, BMC is mainly used for falsification, which is concerned with violations of temporal properties. In addition, a considerable part of this chapter discusses complete extensions, including k-induction and interpolation. These extensions also allow to prove properties.
The planning problem in Artificial Intelligence was the first application of SAT to reasoning about transition systems and a direct precursor to the use of SAT in a number of other applications, including bounded model-checking in computer-aided verification. This chapter presents the main ideas about encoding goal reachability problems as a SAT problem, including parallel plans and different forms of constraints for speeding up SAT solving, as well as algorithms for solving the AI planning problem with a SAT solver. Finally, more general planning problems that require the use of QBF or other generalizations of SAT are discussed.
This chapter covers an application of propositional satisfiability to program analysis. We focus on the discovery of programming flaws in low-level programs, such as embedded software. The loops in the program are unwound together with a property to form a formula, which is then converted into CNF. The method supports low-level programming constructs such as bit-wise operators or pointer arithmetic.
The theory of combinatorial designs has always been a rich source of structured, parametrized families of SAT instances. On one hand, design theory provides interesting problems for testing various SAT solvers; on the other hand, high-performance SAT solvers provide an alternative tool for attacking open problems in design theory, simply by encoding problems as propositional formulas, and then searching for their models using off-the-shelf general purpose SAT solvers. This chapter presents several case studies of using SAT solvers to solve hard design theory problems, including quasigroup problems, Ramsey numbers, Van der Waerden numbers, covering arrays, Steiner systems, and Mendelsohn designs. It is shown that over a hundred of previously-open design theory problems were solved by SAT solvers, thus demonstrating the significant power of modern SAT solvers. Moreover, the chapter provides a list of 30 open design theory problems for the developers of SAT solvers to test their new ideas and weapons.
This chapter surveys a part of the intense research activity that has been devoted by theoretical physicists to the study of randomly generated k-SAT instances. It can be at first sight surprising that there is a connection between physics and computer science. However low-temperature statistical mechanics concerns precisely the behaviour of the low-lying configurations of an energy landscape, in other words the optimization of a cost function. Moreover the ensemble of random k-SAT instances exhibit phase transitions, a phenomenon mostly studied in physics (think for instance at the transition between liquid and gaseous water). Besides the introduction of general concepts of statistical mechanics and their translations in computer science language, the chapter presents results on the location of the satisfiability transition, the detailed picture of the satisfiable regime and the various phase transitions it undergoes, and algorithmic issues for random k-SAT instances.