View: session overviewtalk overview
Invited talk
SAT and applications
Theorem Proving Unleashed
14:00 | First Experiments with Neural CVC5 ABSTRACT. The CVC5 solver is today one of the strongest systems for solving first order problems with theories but also without them. In this work we equip the enumeration-based instantiation method with a neural network that guides the choice of instantiated quantifiers and their ground instances. For that we develop a relatively fast graph neural network that repeatedly scores all available instantiation options with respect to the available formulas. The network runs directly on a CPU without the need for any special hardware. We train the neural guidance on a large set of proofs generated by the e-matching instantiation strategy and evaluate its performance on a set of previously unseen problems. |
14:20 | Waste Reduction: Experiments in Sharing Clauses between Runs of a Portfolio of Strategies (Experimental Paper) ABSTRACT. Most high-end automated theorem provers for first order logic (FOL) split available time between short runs of a large portfolio of search strategies. These runs are typically independent and can be parallelized to exploit all the available processor cores. We explore several avenues of re-using clauses generated by earlier runs and present experimental results of their usefulness or lack thereof. |
14:40 | Automated Theorem Provers Help Improve Large Language Model Reasoning ABSTRACT. In this paper we demonstrate how logic programming systems and Automated first-order logic Theorem Provers (ATPs) can improve the accuracy of Large Language Models (LLMs) for logical reasoning tasks where the baseline performance is given by direct LLM solutions. We first evaluate LLM reasoning on steamroller problems using the PRONTOQA benchmark. We show how accuracy can be improved with a neuro-symbolic architecture where the LLM acts solely as a front-end for translating a given problem into a formal logic language and an automated reasoning engine is called for solving it. However, this approach critically hinges on the correctness of the LLM translation. To assess this translation correctness, we secondly define a framework of syntactic and semantic error categories. We implemented the framework and used it to identify errors that LLMs make in the benchmark domain. Based on these findings, we thirdly extended our method with capabilities for automatically correcting syntactic and semantic errors. For semantic error correction we integrate first-order logic ATPs, which is our main and novel contribution. We demonstrate that this approach reduces semantic errors significantly and further increases the accurracy of LLM logical reasoning. |
15:10 | Prover9 Unleashed: Automated Configuration for Enhanced Proof Discovery ABSTRACT. While many of the state-of-art Automated Theorem Provers (ATP) like E and Vampire, were subject to extensive tuning of strategy schedules in the last decade, the classical ATP prover Prover9 has never been optimized in this direction. Both E and Vampire provide the user with an automatic mode to select good proof search strategies based on the properties of the input problem, while Prover9 provides by default only a relatively weak auto mode. Interestingly, Prover9 provides more varied means for proof control than its competitors. These means, however, must be manually investigated and that is possible only by experienced Prover9 users with a good understanding of how Prover9 works. In this paper, we investigate the possibilities of automatic configuration of Prover9 for user-specified benchmark problems. We employ the automated strategy invention system Grackle to generate Prover9 strategies with both basic and advanced proof search options which require sophisticated strategy space features for Grackle. We test the strategy invention on AIM train/test problem collection and we show that Prover9 can outperform both E and Vampire on these problems. To test the generality of our approach we train and evaluate strategies also on TPTP problems, showing that Prover9 can achieve reasonable complementarity with other ATPs. |
15:30 | Rewriting and Inductive Reasoning PRESENTER: Marton Hajdu ABSTRACT. Rewriting techniques based on reduction orderings generate “just enough” consequences to retain first-order completeness. This is ideal for superposition-based first-order theorem proving, but for at least one approach to inductive reasoning we show that we are missing crucial consequences. We therefore extend the superposition calculus with rewriting-based techniques to generate sufficient consequences for automating induction in saturation. When applying our work within the unit-equational fragment, our experiments with the theorem prover Vampire show significant improvements for inductive reasoning. |
Verification
16:30 | Sorting without Sorts ABSTRACT. We present a reasoning framework in support of establishing the functional correctness of programs with recursive data structures with automated first-order theorem proving. Specifically, we focus on functional programs implementing sorting algorithms. We formalize the semantics of recursive programs in many-sorted first-order logic while introducing sortedness/permutation properties as part of our first-order formalization. Rather than focusing on sorting lists of elements of specific first-order theories such as integer arithmetic, our list formalization relies on a sort parameter abstracting (arithmetic) theories and, hence, concrete sorts. Additionally we formalize the permutation property of lists in first-order logic in an effective way that allows to automatically discharge verification conditions of such algorithms purely by means of first-order theorem proving. Proof search is powered by automated theorem proving: we adjust recent efforts for automating inductive reasoning in saturation-based first-order theorem proving. Importantly, we advocate a compositional reasoning approach for automating proofs by induction required to verify functional programs implementing and preserving sorting and permutation properties over parameterized list structures. We showcase the applicability of our framework over recursive sorting algorithms, including Mergesort and Quicksort; to this end, we turn first-order theorem proving into an automated verification engine by guiding automated inductive reasoning with manual proof splits. |
17:00 | Efficient Simulation for Hardware Model Checking PRESENTER: Joseph Tafese ABSTRACT. Simulation is an important aspect of model checking, serving as an invaluable pre-processing step that can quickly generate a set of reachable states. This is evident in model checking tools at the Hardware Model Checking Competitions, where Bᴛᴏʀ2 is used to represent verification problems. Recently, Bᴛᴏʀ2MLIR was introduced as a novel format for representing safety and correctness constraints for hardware circuits. It provides an executable semantics for circuits represented in Bᴛᴏʀ2 by producing an equivalent program in LLVM-IR. One challenge in simulating Bᴛᴏʀ2 circuits is the use of persistent (i.e., immutable) arrays to represent memory. Persistent arrays work well for symbolic reasoning in Sᴍᴛ but they require copy-on-write semantics when being simulated natively. We provide an algorithm for converting persistent arrays to transient (i.e., mutable) arrays with efficient native execution. This approach is implemented in Bᴛᴏʀ2MLIR, which opens the door for rapid prototyping, dynamic verification techniques and random testing using established tool chains such as LibFuzzer and KLEE. Our evaluation shows that our approach, when compared with BᴛᴏʀSɪᴍ, has a speedup of three orders of magnitude when safety properties are trivial, and at least one order of magnitude when constraints are disabled. |
17:20 | Alternate Semantics of the Guarded Conditional ABSTRACT. The various possible semantics of Dijkstra’s guarded commands are elucidated by means of a new graph-based low-level, generic specification language designed for asynchronous nondeterministic algorithms. The graphs incorporate vertices for actions and edges indicating control flow alongside edges for data flow. This formalism allows one to specify, not just input-output behavior nor only what actions take place, but also precisely how often and in what order they transpire. |
17:30 | Verifying SQL queries using theories of tables and relations - virtual talk over Zoom PRESENTER: Mudathir Mahgoub Yahia Mohamed ABSTRACT. We present a number of first- and second-order extensions to SMT theories specifically aimed at representing and analyzing SQL queries with join, projection, and selection operations. We support reasoning about SQL queries with either bag or set semantics for database tables. We provide the former via an extension of a theory of finite bags and the latter via an extension of the theory of finite relations. Furthermore, we add the ability to reason about tables with null values by introducing a theory of nullable sorts based on an extension of the theory of algebraic datatypes. We implemented solvers for these theories in the SMT solver cvc5 and evaluated them on a set of benchmarks derived from public sets of SQL equivalence problems. |