View: session overviewtalk overview
| 10:45 | Student Forum PRESENTER: Lee Barnett |
| 14:00 | A Formal Y86 Simulator with CHERI Features PRESENTER: William D. Young ABSTRACT. We present a formal executable model of CHERI architectural features integrated with a formalized Y86 ISA. CHERI is an extension of conventional hardware ISAs centralized around capabilities, which are descriptions of permissions at the hardware level that can be used in place of pointers. CHERI enables fine-grained memory protection and highly scalable software compartmentalization to mitigate security vulnerabilities beyond what can be done by current architectures. We use our formal model to prove the memory protection and security features of CHERI itself, prototype extensions to CHERI, and verify the correctness of machine code involving capabilities. Since it is executable, our model also serves as a simulator of a CHERI augmented x86-like processor. We are motivated by the adoption of CHERI by industry leaders that design real-world hardware, the need for industrial-strength tools to perform formal CHERI analyses, and the current development of CHERI extensions to x86 platforms. We build our model in an all-in-one first-order logic and programming system. Our model enables formal verification of CHERI designs and the rapid-prototyping of new CHERI architectural features, and we apply existing industry push-button verification tools and custom heuristics to prove the correctness of CHERI artifacts. |
| 14:30 | On Hyperproperty Verification, Quantifier Alternations, and Games under Partial Information ABSTRACT. Hyperproperties are system properties that relate multiple execution traces in a system and thus capture, e.g., information-flow and robustness properties. Logics for expressing temporal hyperproperties -- such as HyperLTL -- extend LTL by quantifying over executions of a system. Many properties require one or more \emph{quantifier alternations}, which presents a major challenge for verification. Complete verification methods require a system complementation for each quantifier alternation, making it infeasible in practice. A cheaper method interprets the verification of a HyperLTL formula as a two-player parity game between universal and existential quantifiers. The game-based approach is significantly cheaper, facilitates interactive proofs, and allows for easy-to-check certificates of satisfaction. It is, however, limited to $\forall^*\exists^*$ properties, leaving important properties out of reach. In this paper, we show that we can use games to verify \emph{arbitrary} HyperLTL formulas by utilizing multiplayer games under \emph{partial information}. While multiplayer games under partial information are, in general, undecidable, we show that our game is played under hierarchical information and thus falls in a decidable class of games. We discuss completeness of the game and study prophecy variables in the setting of partial information. |
| 15:00 | Scalable MLTL Runtime Monitoring and Satisfiability via Bit-Vector Encoding PRESENTER: Christopher Johannsen ABSTRACT. We introduce a novel encoding for Mission-time Linear Temporal Logic (MLTL), inspired by the recently-proposed Temporal Exponential Propagation method for learning finite-trace Linear Temporal Logic (LTLf) formulas. Our encoding translates MLTL formulas into bit-vector expressions whose size is O(n log2(K)) with n and K being proportional to the formula size and largest temporal interval present in the formula respectively. We demonstrate how this encoding can be leveraged for two core applications: satisfiability checking and runtime monitoring. For satisfiability, our approach complements existing techniques by efficiently solving instances that are challenging for traditional encodings. For runtime monitoring, we show that our method achieves significantly higher throughput than current state-of-the-art tools, enabling faster and more scalable runtime verification in time-sensitive systems. Experimental results confirm that the proposed encoding and monitoring algorithm provide a practical and effective foundation for reasoning about MLTL properties, achieving 10.52 times higher throughput compared to state-of-the-art online techniques over a sample of pattern-instantiated formulas. |
| 16:00 | Quantifying Robustness of 3D BioMedical Image Segmentation Networks Using TensorStars ABSTRACT. Deep learning techniques are increasingly used in medical imaging for several critical tasks such as segmentation, classification, and registration. It has already been demonstrated that deep learning models lack generalizability to out of distribution samples. Therefore, it is important to demonstrate the robustness of deep learning solutions for medical imaging problems to gain the trust of clinicians, patients, model developers, and certification authorities. In this paper, we present a framework for quantifying robustness of U-Nets, a widely used neural network architecture in medical image segmentation. We propose TensorStars, a new representation to model the perturbation of a given medical image and present techniques to model perturbations specific to medical images using TensorStars. We then propose techniques to overapproximate the effect of these perturbations on the segmentation performed by the U-Net. Given the large dimensionality of medical images, naive implementation of overapproximation results in poor scalability of robustness verification. We propose two main improvements, (1) leveraging hardware acceleration from GPUs for performing tensor operations, and (2) leverage the CPU parallelism for accelerating the ReLU overapproximation. We observe that these two enhancements improve the runtime by 15x when implemented on a 32 core CPU with NVidia GeForce RTX with 8GB VRam. To the best of authors' knowledge, this is the first work to quantify robustness of a neural network or 3-dimensional image data. We perform robustness analysis on a U-Net that performs segmentation of kidneys with cross entropy loss of less than 0.01. Our evaluation demonstrates that introducing noise in less than 3% of input voxels can result in misclassification of up to 20% of voxels in the segmentation output. |
| 16:30 | Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision PRESENTER: Samuel Teuber ABSTRACT. As neural networks (NNs) become increasingly prevalent in safety-critical neural network-controlled cyber-physical systems (NNCSs), formally guaranteeing their safety becomes crucial. In these systems, safety must be ensured throughout their entire operation, necessitating infinite-time horizon verification. To verify the infinite-time horizon safety of NNCSs, recent advances leverage Differential Dynamic Logic (dL). However, these dL-based guarantees rely on idealized, real-valued NN semantics and fail to account for the roundoff errors introduced by finite-precision implementations. This paper bridges the gap between theoretical guarantees and real-world implementations by incorporating robustness under finite-precision perturbations -- in sensing, actuation, and computation -- into the safety verification. We model the problem as a hybrid game between a good Demon, responsible for control actions, and a bad Angel, introducing perturbations. This formulation enables formal proofs of robustness under a given perturbation bound. Leveraging this bound, we employ state-of-the-art mixed-precision fixed-point tuners to synthesize sound and efficient implementations, thus providing a complete end-to-end solution. We evaluate our approach on case studies from the automotive and aeronautics domains, producing efficient NN implementations with rigorous infinite-time horizon safety guarantees. |
| 17:00 | Can Large Language Models Autoformalize Kinematics? PRESENTER: Aditi Kabra ABSTRACT. Autonomous cyber-physical systems like robots and self-driving cars could greatly benefit from using formal methods to reason reliably about their control decisions. However, before a problem can be solved it needs to be stated. This requires writing a formal physics model of the cyber-physical system, which is a complex task that traditionally requires human expertise and becomes a bottleneck. This paper experimentally studies whether Large Language Models (LLMs) can automate the formalization process. A 20 problem benchmark suite is designed drawing from undergraduate level physics kinematics problems. In each problem, the LLM is provided with a natural language description of the objects' motion and must produce a model in differential game logic (dGL). The model is (1) syntax checked and iteratively refined based on parser feedback, and (2) semantically evaluated by checking whether symbolically executing the dGL formula recovers the solution to the original physics problem. A success rate of 70% (best over 5 samples) is achieved. We analyze failing cases, identifying directions for future improvement. This provides a first quantitative baseline for LLM-based autoformalization from natural language to a hybrid games logic with continuous dynamics. |
| 17:20 | "How Does my Circuit Work?": Local Explanations for the Behavior of Sequential Circuits ABSTRACT. There has been a massive amount of work in algorithms to verify and synthesize systems from temporal specifications. In contrast, there has been less work devoted to the problem of helping engineers to understand how and why their systems exhibit certain behaviors. Such understanding is important for them to debug, validate, and modify their implementations in response to changing needs. In this paper, we present one possible formalization of this problem as the task of recovering specifications that locally describe the behavior of individual parts of the circuit, given LTL specifications that globally describe the behavior of the entire circuit. We study the theoretical properties of these \emph{temporal subspecifications}, and show that they are not always expressible in LTL, but can always be described by $\omega$-regular languages. We show that our algorithm can efficiently generate compact subspecifications when applied to benchmarks from the SYNTCOMP 2023 competition. Finally, through a user study, we show that subspecifications improve the accuracy of engineers by a factor of 17 when answering questions about these circuits. |