CAV2024: 36TH INTERNATIONAL CONFERENCE ON COMPUTER AIDED VERIFICATION
PROGRAM FOR FRIDAY, JULY 26TH
Days:
previous day
next day
all days

View: session overviewtalk overview

08:30-09:00Breakfast
09:00-10:30 Session 11: Learning
Location: MB 1.210
09:00
Bisimulation Learning

ABSTRACT. We introduce for the first time a data-driven approach to computing finite bisimulations for state transition systems with very large, possibly infinite state space. We specialise our novel technique to stutter-insensitive bisimulations of deterministic systems, which we characterize as the problem of learning a state classifier together with a ranking function for each class. We present a counterexample-guided inductive synthesis procedure that learns a candidate state classifier and candidate ranking functions from a finite dataset of sample states; then, it checks whether these generalise to the entire state space using satisfiability modulo theory solving. Upon the affirmative answer, the procedure concludes that the classifier constitutes a valid stutter-insensitive bisimulation of the system. Upon a negative answer, the solver produces a counterexample state for which the classifier violates the claim, adds it to the dataset, and repeats learning and checking in a loop until a valid bisimulation is found. We demonstrate on a range of benchmarks from reactive verification and software model checking that our method yields faster verification results than alternative state-of-the-art tools in practice. Our method produces abstractions that are succinct, enable an effective verification of linear temporal logic without next operator, and are interpretable for system diagnostics.

09:20
LTL learning on GPUs

ABSTRACT. Linear temporal logic (LTL) is widely used in industrial verification. LTL formulae can be learned from traces. Scaling LTL formula learning is an open problem. We implement the first GPU-based LTL learner using a novel form of enumerative program synthesis. The learner is sound and complete. Our benchmarks indicate that it handles traces at least 2048 times more numerous, and on average at least 46 times faster than existing state-of-the-art learners. This is achieved with, among others, novel branch-free LTL semantics that has $O(\log n)$ time complexity, where $n$ is trace length, while previous implementations are $O(n^2)$ or worse (assuming bitwise boolean operations and shifts by powers of 2 have unit costs---a realistic assumption on modern processors).

09:40
(Distinguished Paper) Regular Reinforcement Learning

ABSTRACT. Modeling complex systems in terms of formal languages and their transformations was pioneered in the context of formal verification by the body of work now known as regular model checking, in which dynamical systems are represented by encoding their states as regular languages and their state-transitions as rational transductions. The work in this area leverages inherent symmetries found in these languages and transformations towards developing algorithmic methods for verifying asymptotic properties of systems, such as those with infinite state spaces, that lie beyond the scope of traditional model checking. Inspired by the language-theoretic design paradigm of regular model checking, we investigate the use of regular languages and rational transductions as the basic building blocks for reinforcement learning systems. More specifically, we study reinforcement learning problems taking place in Markov decision processes with state spaces comprising families of regular languages and state-transition mappings taking the form of rational transductions. We call this setting regular reinforcement learning.

10:00
Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned Models

ABSTRACT. We introduce VELM, a reinforcement learning (RL) framework grounded in verification principles for safe exploration in unknown environments. VELM ensures that an RL agent systematically explores its environment, adhering to safety properties throughout the learning process. VELM learns environment models as symbolic formulas and conducts formal reachability analysis over the learned models for safety verification. An online shielding layer is then constructed to confine the RL agent's exploration solely within a state space verified as safe in the learned model, thereby bolstering the overall safety profile of the RL system. Our experimental results demonstrate the efficacy of VELM across diverse RL environments, highlighting its capacity to significantly reduce safety violations in comparison to existing safe learning techniques, all without compromising the RL agent's reward performance.

10:20
Monitoring Unmanned Aircraft: Specification, Integration, and Lessons-learned

ABSTRACT. This paper reports on the integration of runtime monitoring into fully-electric aircraft designed by Volocopter, a German aircraft manufacturer of electric multi-rotor helicopters. The runtime monitor recognizes hazardous situations and system faults. Since the correct operation of the monitor is critical for the safety of the aircraft, the development of the monitor must follow strict aeronautical standards. This includes the integration of the monitor into different development environments, including log-file analysis, hardware/software-in-the-loop testing, and test flights. We have used the stream-based monitoring framework RTLola to generate monitors for a range of requirements. In this paper, we present representative monitoring specifications and our lessons learned from integrating the generated monitors. Our main finding is that the specification and the integration need to be decoupled, because the specification remains stable throughout the development process, whereas the different development stages require a separate integration of the monitor into each environment. We achieve this decoupling with a novel abstraction layer in the monitoring framework that adapts the monitor to each environment without affecting the core component generated from the specification. The decoupling of the integration has also allowed us to react quickly to the frequent changes in the hardware and software environment of the monitor due to the fast-paced development of the aircraft in a startup company.

10:30-11:00Coffee Break
12:00-14:00Lunch (Not Provided)
14:00-15:30 Session 13: Software Verification 2
Location: MB 1.210
14:00
The Top-Down Solver Verified: Building Confidence in Static Analyzers

ABSTRACT. The top-down solver (TD) is a local fixpoint algorithm for arbitrary equation systems. It considers the right-hand sides as black boxes and detects dependencies between unknowns on the fly—features that significantly increase both its usability and practical efficiency. At the same time, the recursive evaluation strategy of the TD, combined with the non-local destabilization mechanism, obfuscates the correctness of the computed solution. To strengthen the confidence in tools relying on the TD as their fixpoint engine, we provide a first machine-checked proof of the partial correctness of the TD. Our proof builds on the observation that the TD can be obtained from a considerably simpler recursive fixpoint algorithm by applying an optimization that neither affects the termination behavior nor the computed result. Accordingly, we break down the proof into a proof of the plain TD, which is only then extended to include the optimization. The backbone of our proof is a mutual induction following the solver’s computation trace. We establish sufficient invariants about the solver state to conclude the correctness of its optimization, i.e., the plain TD terminates if and only if the full TD terminates, and they return the identical assignment. The proof is written using Isabelle/HOL and is available in the archive of formal proofs.

14:20
(Recorded) hevm, a Fast Symbolic Execution Framework for EVM Bytecode

ABSTRACT. We present hevm, a symbolic execution engine for the EVM. hevm can prove safety properties for EVM bytecode or verify seman- tic equivalence between two bytecode objects. It exposes a user-friendly API in Solidity that allows end-users to define symbolic tests using al- most exactly the same syntax as they would for their usual unit tests. We evaluate our framework against state-of-the-art tools, using a com- prehensive set of benchmarks. Our empirical findings demonstrate that hevm outperforms its counterparts, effectively solving a greater number of problems within competitive timeframes.

14:30
SolTG: A CHC-based Solidity Test Case Generator

ABSTRACT. Test coverage is an important metric for the development of blockchain smart contracts. In this paper, we present SolTG, an automated CHC-based test case generator for Solidity smart contracts. SolTG exhaustively enumerates symbolic formulas and makes calls to SMT solver to find input values under which the contract exhibits the corresponding behavior. Test cases synthesized by SolTG have the form of a sequence of function calls over concrete values of input parameters, which lead to a specific execution scenario. The tool supports multiple Solidity-specific features and is capable of generating high coverage for industrial-grade Solidity code. We present a detailed architecture of SolTG and the translation of smart contracts into their CHC representation that the tool relies on. We also present the experimental results for test generation on the regression and industrial benchmarks.

14:40
(Recorded) Breaking the Mold: Nonlinear Ranking Function Synthesis without Templates

ABSTRACT. This paper studies the problem of synthesizing (lexicographic) polynomial ranking functions for loops that can be described in polynomial arithmetic over integers and reals. While the analogous ranking function synthesis problem for linear arithmetic is decidable, even checking whether a given function ranks an integer loop is undecidable in the nonlinear setting. We side-step the decidability barrier by working within the theory of linear integer/real rings (LIRR) rather than the standard model of arithmetic. We develop a termination analysis that is guaranteed to succeed if a loop (expressed as an LIRR formula) admits a (lexicographic) polynomial ranking function. In contrast to template-based ranking function synthesis in real arithmetic, our completeness result holds for lexicographic ranking functions of unbounded length and degree, and effectively subsumes linear lexicographic ranking function synthesis for linear integer loops.

15:00
(Recorded) End-to-end Mechanized Proof of a JIT-accelerated eBPF Virtual Machine for IoT

ABSTRACT. Modern operating systems have adopted Berkeley Packet Filters (BPF) as a mechanism to extend kernel functionalities dynamically, e.g., Linux's eBPF or RIOT's rBPF. The just-in-time (JIT) compilation of eBPF introduced in Linux eBPF for performance has however led to numerous critical issues. Instead, RIOT's rBPF uses a slower but memory-isolating interpreter (a virtual machine) that implements a defensive semantics of BPF but can hardly translate to compact JITed code. To address this problem, this paper presents a fully verified JIT implementation for RIOT's rBPF, consisting of: i/ an end-to-end co-verification methodology for both proving the JIT correct from an abstract specification and by deriving a verified concrete C implementation; ii/ a symbolic CompCert interpreter for executing jited binary code; iii/ a verified JIT compiler for rBPF; iv/ a verified hybrid rBPF virtual machine. Our core contribution is, to the best of our knowledge, the first and fully verified rBPF JIT compiler with correctness guarantees from high-level specification to low-level implementation. Benchmarks on microcontrollers hosting the RIOT operating system demonstrate significant performance improvements over the existing implementations of rBPF, even in worst-case application scenarios.

15:20
(Distinguished Paper) (Recorded) Interactive Theorem Proving modulo Fuzzing

ABSTRACT. Interactive theorem provers (ITPs) exploit the collaboration between humans and computers, enabling proof of complex theorems. Further, ITPs allow extraction of provably correct implementations from proofs. However, often, the extracted code interface with external libraries containing real-life complexities---proprietary library calls, remote/cloud APIs, complex models like ML models, inline assembly, highly non-linear arithmetic, vector instructions etc. We refer to such functions/operations as closed-box components. For such components, the user has to provide appropriate assumed lemmas to model the behavior of these functions. However, we found instances where these assumed lemmas are inconsistent with the actual semantics of these closed-box components. Hence, even correct-by-construction code extracted from an ITP may still behave incorrectly when interfaced with such closed-box components.

To this end, we propose StarFuzz, that allows the F* interactive theorem prover to provide better end-to-end assurance on the application--- even when interfaced with the closed-box components. Under the hood, StarFuzz rides on Sādhak, an SMT solving that combines fuzz testing to allow satisfiability checking over closed-box components. On the F* library that includes external implementations in OCaml, StarFuzz discovered four bugs---one bug that revealed an error on the assumed lemmas for a closed-box function, and three bugs in the external implementations of these components.

15:30-16:00Coffee Break
16:00-17:00 Session 14: Hardware Model Checking
Location: MB 1.210
16:00
Toward Liveness Proofs at Scale

ABSTRACT. While the problem of mechanized proof of liveness of reactive programs has been studied for decades, there is currently no method of proving liveness that is is conceptually simple to apply in practice to realistic problems, can be scaled to large problems without modular decomposition, and does not fail unpredictably due to the use of fragile heuristics. We introduce a method of liveness proof by relational rankings, implement it, and show that it meets these criteria in a realistic industrial case study involving a model of a memory subsystem in hardware.

16:20
Avoiding the Shoals - A New Approach to Liveness Checking

ABSTRACT. We present rlive, a new SAT-based model checking algorithm for the verification of liveness properties of finite-state symbolic transition systems. Like other recent approaches, rlive works by reducing liveness checking to a sequence of safety checks. Similarly to FAIR, it incrementally strengthens the input system using constraints obtained by refuting candidate counterexamples to the input liveness property, assumed (w.l.o.g.) to be of the form FGq. Differently from FAIR (and crucially), however, instead of directly searching for lasso-shaped counterexamples visiting ¬q infinitely-often, rlive searches for counterexamples incrementally, via a recursive chain of safety checks, each of which tries to determine whether it is possible to reach a ¬q-state from a given ¬q-state (which was previously determined to be reachable), in a manner similar to k-Liveness. When the current candidate counterexample is refuted, rlive exploits the inductive invariants generated by the (recursive) safety checks to restrict the search space, until either no more reachable ¬q-states remain, or a real lasso-shaped counterexample is found. In this paper, we describe rlive in detail, prove its soundness and completeness, and compare it against the state of the art both theoretically and empirically. Our experimental results show that our implementation of rlive outperforms state-of-the-art implementations of FAIR, k-Liveness, and other SAT-based liveness checking algorithms on a wide range of benchmarks from the literature.

16:40
SMLP: Symbolic Machine Learning Prover

ABSTRACT. SMLP is a tool and a library for system exploration based on data samples obtained by simulating or executing the system on a number of input vectors. SMLP aims at exploring the system based on this data by taking a grey-box approach by combining statistical methods of data exploration with building and exploring machine learning models in close feedback loop with the system's response, and exploring these models by combining probabilistic and formal methods.

SMLP has been applied in industrial setting at Intel for analyzing and optimizing hardware designs at the analog level. SMLP is a general purpose tool and can be applied to other systems that can be simulated and modelled by machine learning models.

16:50
Symbolic Model-Checking Intermediate-Language Tool Suite

ABSTRACT. We release the first tool suite implementing MoXI (Model eXchange Interlingua), an intermediate language for symbolic model checking designed to be an international research-community standard and developed by a widespread collaboration under a National Science Foundation CISE Community Research Infrastructure initiative. Although we focus here on hardware verification, the MoXI language is useful for software model checking and verification of infinite-state systems in general. MoXI builds on elements of SMT-LIB 2; it is easy to add new theories and operators. Our contributions include: (1) introducing the first tool suite of automated translators into and out of the new model-checking intermediate language; (2) composing an initial example benchmark set enabling the model-checking research community to build future translations; (3) compiling details for utilizing, extending, and improving upon our tool suite, including usage characteristics and initial performance data. Experimental evaluations demonstrate that compiling SMV-language models through MoXI to perform symbolic model checking with the tools from the last Hardware Model Checking Competition performs competitively with model checking directly via nuXmv.

17:00-18:10 Session 15: Runtime Verification and Monitoring
Location: MB 1.210
17:00
Predictive Monitoring with Strong Trace Prefixes

ABSTRACT. Runtime predictive analyses enhance coverage of traditional dynamic analysis based bug detection techniques by identifying a space of feasible reorderings of the observed execution and determining if any of these witnesses the violation of some desired safety property. The most popular approach for modeling the space of feasible reorderings is through Mazurkiewicz's trace equivalence. The simplicity of the framework also gives rise to efficient predictive analyses, and has been the de facto means for obtaining space and time efficient algorithms for monitoring concurrent programs.

In this work, we attempt to enhance the predictive power of trace-based reasoning, while still retaining the algorithmic benefits it offers. Towards this, we extend trace theory by embedding a class of prefixes, which we call \emph{strong trace prefixes}. We formally characterize strong trace prefixes using an enhanced dependence relation, study its predictive power and establish a tight connection to the previously proposed notion of synchronization preserving correct reorderings developed in the context of data race and deadlock prediction. We then show that despite the enhanced predictive power, strong trace prefixes continue to enjoy the algorithmic benefits of Mazurkiewicz traces in the context of prediction against co-safety properties. As a consequence, we also derive constant space and linear time algorithms for detecting synchronization preserving data races and deadlocks, improving upon the previously proposed linear space algorithms. We also show that strong trace prefixes can be instrumental in finding more violations of pattern languages efficiently. We implement our proposed algorithms for predictive monitoring using prefix traces, and our evaluation confirms the practical utility of reasoning based on strong trace prefixes.

17:20
General Anticipatory Runtime Verification
PRESENTER: Hannes Kallwies

ABSTRACT. Runtime verification is a technique for monitoring a system’s behavior against a formal specification. Monitors must produce verdicts that are sound with respect to the specification. The concept of anticipation involves the ability of a monitor to immediately produce verdicts when it can confidently predict the inevitability of the verdict. Anticipation has been studied for temporal logics on infinite traces. Stream runtime verification is a specialized form of runtime verification tailored to the continuous monitoring and verification of data streams. In this paper we study anticipatory monitoring for stream runtime verification. More specifically, we present an algorithm with anticipation for monitoring of Lola specifications, which we then extend to exploit assumptions and tolerate uncertainties. As perfect anticipation is in general not computable, we use techniques from abstract interpretation, especially widening, to approximate anticipatory monitoring verdicts. Finally, we have implemented a symbolic instantiation of our approach. A first evaluation shows promising results.

17:40
Proactive Real-Time First-Order Enforcement

ABSTRACT. Modern software systems must comply with increasingly intricate regulations in domains ranging from industrial automation to data protection. Runtime enforcement addresses this challenge by empowering systems to not only observe, but actively control the behavior of target systems by modifying their actions to ensure policy compliance.

We propose a novel approach to the proactive real-time enforcement of policies expressed in metric first-order temporal logic (MFOTL). We introduce a new system model, define an expressive MFOTL fragment that is enforceable in that model, and develop a sound enforcement algorithm for this fragment. We implement this algorithm in a new tool called WhyEnf and carry out a case study on enforcing GDPR-related policies.

Our tool can enforce all policies from the study in real-time with modest overhead. Our work thus constitutes the first tool-supported proactive real-time policy enforcement approach for expressive first-order policies.

18:00
Testing the migration from analog to software-based Railway Interlocking Systems

ABSTRACT. We work in the context of a tool set developed for the Italian Railway Network supporting the migration of legacy relay-based interlocking systems to a new software-based implementation. We propose to generate test cases from the analogical implementation in a way that they are significant for a comparison with a cycle-based computational model, by leveraging stable states abstraction. Our methodology found actual bugs in the new code that were missed by other analyses, and aids in documenting the expected differences with the legacy behaviors.

19:30-21:00 CAV Banquet

CAV Banquet at Vieux Port Steakhouse located at 405 rue Saint-Jean-Baptiste, Montréal, QC, H2Y 2Z7

map

Note that this is a different entrance than the main entrace for the restaurant.