Using Four-Valued Signal Temporal Logic for Incremental Verification of Hybrid Systems
ABSTRACT. Hybrid systems are often safety-critical and at the same time difficult to formally verify due to their mixed discrete and continuous behavior.
To address this issue, we propose a novel incremental verification algorithm for hybrid systems based on online monitoring techniques and reachability analysis.
To this end, we develop a four-valued semantics for signal temporal logic that allows us to distinguish two types of uncertainty: one arising from set-based evaluation and another one from the incremental nature of our algorithm.
Using these semantics to continuously update the verification verdict, our verification algorithm is the first to run alongside the reachability analysis of the system to be verified.
This makes it possible to stop the reachability analysis as soon as we obtain a conclusive verdict.
We demonstrate the usefulness of our novel approach by several experiments.
Optimization-Based Model Checking for Complex STL Specifications
ABSTRACT. We present a bounded model checking algorithm for signal temporal logic (STL) that exploits mixed-integer linear programming (MILP). A key technical element is our novel MILP encoding of the STL semantics; it follows the idea of stable partitioning from the recent work on SMT-based STL model checking. Assuming that our (continuous-time) system models can be encoded to MILP---typical examples are rectangular hybrid automata (precisely) and hybrid dynamics with closed-form solutions (approximately)---our MILP encoding yields an optimization-based model checking algorithm that is scalable, is anytime/interruptible, and accommodates parameter mining. Experimental evaluation shows our algorithm's performance advantages especially for complex STL formulas, demonstrating its practical relevance e.g. in the automotive domain.
soid: A Tool for Legal Accountability for Automated Decision Making
ABSTRACT. We present soid, a tool for interrogating the decision making of autonomous agents using SMT-based automated reasoning. Relying on the Z3 SMT solver and KLEE symbolic execution engine, soid allows investigators to receive rigorously proven answers to factual and counterfactual queries about agent behavior, enabling effective legal and engineering accountability for harmful or otherwise incorrect decisions. We evaluate soid qualitatively and quantitatively on a pair of examples, i) a buggy implementation of a classic decision tree inference benchmark from the explainable AI (XAI) literature; and ii) a car crash in a simulated physics environment. For the latter, we also contribute the soid-gui, a domain-specific, web-based example interface for legal and other practitioners to specify factual and counterfactual queries without requiring sophisticated programming or formal methods expertise.
(Recorded) Inner-approximate Reachability Computation via Zonotopic Boundary Analysis
ABSTRACT. Inner-approximate reachability analysis involves calculating subsets of reachable sets, known as inner-approximations. This analysis is crucial in the fields of dynamic systems analysis and control theory as it provides a reliable estimation of the set of states that a system can reach from given initial states at a specific time instant. In this paper, we study the inner-approximate reachability analysis problem based on the set-boundary reachability method for systems modelled by ordinary differential equations, in which the computed inner-approximations are represented with zonotopes. The set-boundary reachability method computes an inner-approximation by excluding states reached from the initial set's boundary. The effectiveness of this method is highly dependent on the efficient extraction of the exact boundary of the initial set. To address this, we propose methods leveraging boundary and tiling matrices that can efficiently extract and refine the exact boundary of the initial set represented by zonotopes. Additionally, we enhance the exclusion strategy by contracting the outer-approximations in a flexible way, which allows for the computation of less conservative inner-approximations. To evaluate the proposed method, we compare it with state-of-the-art methods against a series of benchmarks. The numerical results demonstrate that our method is not only more efficient but also more accurate in computing inner-approximations.
(Recorded) Scenario-based Flexible Modeling and Scalable Falsification for Reconfigurable CPSs
ABSTRACT. Cyber-physical systems (CPSs) are used in many safety-critical areas, making it crucial to ensure their safety. However, with CPSs increasingly dynamically deployed and reconfigured during runtime, their safety analysis becomes challenging. For one thing, reconfigurable CPSs usually consist of multiple agents dynamically connected during runtime. Their highly dynamic system topologies are too intricate for traditional modeling languages, which, in turn, hinders formal analysis. For another, due to the growing size and uncertainty of reconfigurable CPSs, their system models can be huge and even unavailable at design time. This calls for runtime analysis approaches with better scalability and efficiency.
To address these challenges, we propose a scenario-based hierarchical modeling language for reconfigurable CPS. It provides template models for agent inherent features, together with an instantiation mechanism to activate single agent's runtime behavior, communication configurations for multiple agents' connected behaviors, and scenario task configurations for their dynamic topologies. We also present a path-oriented falsification approach to falsify system requirements. It employs classification-model-based optimization to explore search space effectively and cut unnecessary system simulations and robustness calculations for efficiency. Our modeling and falsification are implemented in a tool called SNIFF. Experiments have shown that it can largely reduce modeling time and improve modeling accuracy, and perform scalable CPS falsification with high success rates (99.7% on average) in seconds.
ABSTRACT. SMT (Satisfiability Modulo Theories) solving is a technology for the fully automated solution of logical formulas. SMT solvers can be used as general-purpose off-the-shelf tools. Due to their impressive efficiency, they are nowadays frequently used in a wide variety of applications. A typical application encodes real-world problems as logical formulas, whose solutions can be decoded to solutions of the real-world problem.
Besides its unquestionable practical impact, SMT solving has another great merit: it inspired truly elegant ideas, which do not only enable the construction of efficient software tools, but provide also interesting theoretical insights.In this talk we give an introduction to the mechanisms of SAT and SMT solving, discuss some interesting ideas behind recent developments, and illustrate the usage of SMT solvers on a few application examples.
(Recorded) Probabilistic Access Policies with Automated Reasoning Support
ABSTRACT. Existing access policy languages equipped with SMT-based automated reasoning
capabilities are effective in providing formal guarantees about the policies.
However, this scheme only supports access control based on deterministic
information. Observing that certain information useful for access control
could be described by random variables, we are motivated to extend the current
access control scheme by adding support for talking about uncertainty in the
policies, or more precisely probability of random events. To compute these
probabilities, we invoke probabilistic programs when performing access
control. Additionally, we show that the probabilistic part of these policies
can actually be encoded in linear real arithmetic, which enables practical
automated reasoning tasks such as proving relative permissiveness between
policies. We demonstrate the advantages of the proposed probabilistic policies
over existing access control methods through two case studies on real-world
datasets.
ABSTRACT. The de-facto standard approach in MDP verification is based on value iteration (VI). We propose compositional VI, a framework for model checking compositional MDPs, that addresses efficiency while maintaining soundness. Concretely, compositional MDPs naturally arise from the combination of individual components, and their structure can be expressed using, e.g., string diagrams. Towards efficiency, we observe that compositional VI repeatedly verifies individual components. We propose a technique called Pareto caching that allows to reuse verification results, even for previously unseen queries. Towards soundness, we present two stopping criteria: one generalizes the optimistic value iteration paradigm and the other uses Pareto caches in conjunction with recent baseline algorithms. Our experimental evaluations shows the promise of the novel algorithm and its variations, and identifies challenges for future work.
Approximate Relational Reasoning for Quantum Programs
ABSTRACT. Quantum computation is inevitably subject to imperfections in its implementation. These imperfections arise from various sources, including environmental noise at the hardware level and the introduction of approximate implementations by quantum algorithm designers, such as lower-depth computations. Given the significant advantage of relational logic in program reasoning and the importance of assessing the robustness of quantum programs between their ideal specifications and imperfect implementations, we design a proof system to verify the approximate relational properties of quantum programs. We demonstrate the effectiveness of our approach by providing the first formal verification of the renowned low-depth approximation of the quantum Fourier transform. Furthermore, we validate the approximate correctness of the repeat-until-success algorithm. From the technical point of view, we develop approximate quantum coupling as a fundamental tool to study approximate relational reasoning for quantum programs, a novel generalization of the widely used approximate probabilistic coupling in probabilistic programs, answering a previously posed open question for projective predicates.
QReach: A Reachability Analysis Tool for Quantum Markov Chains
ABSTRACT. We present QReach, the first reachability analysis tool for quantum Markov chains based on decision diagrams CFLOBDD (presented at CAV 2023). QReach provides a novel framework for finding reachable subspaces, as well as a series of model-checking subprocedures like image computation. Experiments indicate its practicality in verification of quantum circuits and algorithms. QReach is expected to play a central role in future quantum model checkers.
(Recorded) Measurement-based Verification of Quantum Markov Chains
ABSTRACT. Model-checking techniques have been extended to analyze quantum programs and communication protocols represented as quantum Markov chains, an extension of classical Markov chains. To specify qualitative temporal properties, a subspace-based quantum temporal logic is used, which is built on Birkhoff–von Neumann atomic propositions. These propositions determine whether a quantum state is within a subspace of the entire state space. In this paper, we propose the measurement-based linear-time temporal logic MLTL to check quantitative properties. MLTL builds upon classical linear-time temporal logic (LTL) but introduces quantum atomic propositions that reason about the probability distribution after measuring a quantum state. To facilitate verification, we extend the symbolic dynamics-based techniques for stochastic matrices described by Agrawal et al. (JACM 2015) to handle more general quantum linear operators (super-operators) through eigenvalue analysis. This extension enables the development of an efficient algorithm for approximately model checking a quantum Markov chain against an MLTL formula. To demonstrate the utility of our model-checking algorithm, we use it to simultaneously verify linear-time properties of both quantum and classical random walks. Through this verification, we confirm the previously established advantages discovered by Ambainis et al. (STOC 2001) of quantum walks over classical random walks and discover new phenomena unique to quantum walks.
Guiding Enumerative Program Synthesis with Large Language Models
ABSTRACT. Pre-trained Large Language Models (LLMs) are beginning to dominate the discourse around automatic code generation with natural language specifications. In contrast, the best-performing synthesizers in the domain of formal synthesis with precise logical specifications are still based on enumerative algorithms. In this paper, we evaluate the abilities of LLMs to solve formal synthesis benchmarks by carefully crafting a library of prompts for the domain. When one-shot synthesis fails, we propose a novel enumerative synthesis algorithm, which integrates calls to an LLM into a weighted probabilistic search. This allows the synthesizer to provide the LLM with information about the progress of the enumerator, and the LLM to provide the enumerator with syntactic guidance in an iterative loop. We evaluate our techniques on benchmarks from the Syntax-Guided Synthesis competition. We find that GPT-3.5 as a stand-alone tool for formal synthesis is easily outperformed by state-of-the-art algorithms, but our approach integrating the LLM into an enumerative synthesis algorithm shows significant performance gains over both the LLM and the enumerative synthesizer alone and the winning SyGuS competition tool.
Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems
ABSTRACT. The rapid advance of deep reinforcement learning techniques enables the oversight of safety-critical systems through the utilization of Deep Neural Networks (DNNs). This underscores the pressing need to promptly establish certified safety guarantees for such DNN-controlled systems. Most of the existing verification approaches rely on qualitative approaches, predominantly employing reachability analysis. However, qualitative verification proves inadequate for DNN-controlled systems as their behaviors exhibit stochastic tendencies when operating in open and adversarial environments. In this paper, we propose a novel framework for unifying both qualitative and quantitative safety verification problems of DNN-controlled systems. This is achieved by formulating the verification tasks as the synthesis of valid neural barrier certificates (NBCs). Initially, the framework seeks to establish almost-sure safety guarantees through qualitative verification. In cases where qualitative verification fails, our quantitative verification method is invoked, yielding precise lower and upper bounds on probabilistic safety across both infinite and finite time horizons. To facilitate the synthesis of NBCs, we introduce their k-inductive variants. We also devise a simulation-guided approach for training NBCs, aiming to achieve tightness in computing precise certified lower and upper bounds. We prototype our approach into a tool called uniqq and showcase its efficacy on four classic DNN-controlled systems.
Certified Robust Accuracy of Neural Networks Are Bounded due to Bayes Errors
ABSTRACT. Adversarial examples pose a security threat to many critical systems built on neural networks. While certified training improves robustness, it also decreases accuracy noticeably. Despite various proposals for addressing this issue, the significant accuracy drop remains. More importantly, it is not clear whether there is a certain fundamental limit on achieving robustness whilst maintaining accuracy. In this work, we offer a novel perspective based on Bayes errors. By adopting Bayes error to robustness analysis, we investigate the limit of certified robust accuracy, taking into account data distribution uncertainties. We first show that the accuracy inevitably decreases in the pursuit of robustness due to changed Bayes error in the altered data distribution. Subsequently, we establish an upper bound for certified robust accuracy, considering the distribution of individual classes and their boundaries. Our theoretical results are empirically evaluated on real-world datasets and are shown to be consistent with the limited success of existing certified training results, e.g., for CIFAR10, our analysis results in an upper bound (of certified robust accuracy) of 67.49%, meanwhile existing approaches are only able to increase it from 53.89% in 2017 to 62.84% in 2023.
Boosting Few-Pixel Robustness Verification via Covering Verification Designs
ABSTRACT. Proving local robustness is crucial to increase the reliability of neural networks. While many verifiers prove robustness in L_\infty \epsilon-balls, very little work deals with robustness verification in L_0 \epsilon-balls, capturing robustness to few pixel attacks. This verification introduces a combinatorial challenge, because the space of perturbations is discrete and of exponential size. A previous work relies on covering designs to identify sets for defining L_\infty neighborhoods, which if proven robust imply that the L_0 \epsilon-ball is robust. However, the number of neighborhoods to verify remains very high, leading to a high analysis time. We propose covering verification designs, a combinatorial design that tailors effective but analysis-incompatible coverings to L_0 robustness verification. The challenge is that computing a covering verification design introduces a high time and memory overhead, which is intensified in our setting, where multiple candidate coverings are required to identify how to reduce the overall analysis time. The challenge is that computing a covering verification design introduces a high time and memory overhead, which is intensified in our setting, where multiple candidate coverings are required to identify how to reduce the overall analysis time. We introduce CoVerD, an L_0 robustness verifier that selects between different candidate coverings without constructing them, but by predicting their set sizes' distribution. This prediction relies on a theorem providing closed-form expressions for the mean and variance of this distribution. For the chosen covering, CoVerD constructs the covering verification design on-the-fly, while keeping the memory consumption minimal and enabling to parallelize the analysis. The experimental results show that CoVerD reduces the verification time on average by up to 5.1x compared to prior work, as well as scales to larger L_0 \epsilon-balls.
(Recorded) Verifying Global Two-Safety Properties in Neural Networks with Confidence
ABSTRACT. We present the first automated verification technique for confidence-based 2-safety properties, such as global robustness and global fairness, in deep neural networks (DNNs). Our approach combines self-composition to leverage existing reachability analysis techniques and a novel abstraction of the softmax function, which is amenable to automated verification. We characterize and prove the soundness of our static analysis technique. Furthermore, we implement it on top of Marabou, a state-of-the-art safety analysis tool for neural networks, conducting a performance evaluation on several publicly available benchmarks for DNN verification.
(Recorded) Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification
ABSTRACT. Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view of such needs, an automated approach for specification synthesis is desired. While existing automated approaches are limited in their versatility, i.e., they either focus only on synthesizing loop invariants for numerical programs, or are tailored for specific types of programs or invariants. Programs involving multiple complicated data types (e.g., arrays, pointers) and code structures (e.g., nested loops, function calls) are often beyond their capabilities. To help bridge this gap, we present AutoSpec, an automated approach to synthesize specifications for automated program verification. It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof. It is driven by static analysis and program verification, and is empowered by large language models (LLMs). AutoSpec addresses the practical challenges in three ways: (1) driving \name by static analysis and program verification, LLMs serve as generators to generate candidate specifications, (2) programs are decomposed to direct the attention of LLMs, and (3) candidate specifications are validated in each round to avoid error accumulation during the interaction with LLMs. In this way, AutoSpec can incrementally and iteratively generate satisfiable and adequate specifications. The evaluation shows its effectiveness and usefulness, as it outperforms existing works by successfully verifying 79% of programs through automatic specification synthesis, a significant improvement of 1.592x. It can also be successfully applied to verify the programs in a real-world X509-parser project.