A Formal Approach for Safe Reinforcement Learning: A Rate-Adaptive Pacemaker Case Study
ABSTRACT. As learning-enabled Cyber-Physical Systems (CPSs) are increasingly used in safety-critical settings, there is a growing need to ensure their safety. For example, to tackle the problem of rate-adaptive pacemakers which correct Sinus Node Dysfunction, a
Reinforcement Learning (RL) approach may be used to mimic the natural pacing rhythm of a person’s heart. However, this is currently not done and there are no known approaches to ensure the safety of combining RL with conventional pacing algorithms. While there is growing interest on ensuring the safety of AI-enabled CPS, the issue of safe RL for CPS, using light-weight formal methods, has drawn scant attention.
Therefore, we present an approach which combines Runtime Enforcement with RL. To guarantee safety throughout the RL agent’s learning and execution stages, an enforcer is constructed from a set of key safety policies expressed using a variant of timed automata. The RL agent’s outputs are observed by the enforcer, which ensures
that only safe actions are delivered to the environment by correcting
the outputs which would violate the safety policies. In order to evaluate the proposed approach, we have implemented a rate-adaptive pacemaker which learns the natural pacing rhythm through an RL approach, allowing it to pace appropriately during disease. This model is executed in closed-loop with a real-time heart model where
various diseases can be exhibited in order to test its efficacy. Furthermore, we illustrate the benefits of this system by contrasting it with traditional pacing techniques and RL without the use of an enforcer.
ABSTRACT. Stream-based monitoring is a runtime verification approach
for cyber-physical systems that translates streams of input data, such as
sensor readings, into streams of aggregate statistics and verdicts about
the safety of the running system. It is usually assumed that the values
on the input streams represent fully accurate measurements of the physical world.
In reality, however, physical sensors are prone to measurement
noise and errors. These errors are further amplified by the processing and
aggregation steps within the monitor. This paper introduces RLola, a
robust extension of the stream-based specification language Lola. RLola
incorporates the concept of slack variables, which symbolically represent
measurement noise while avoiding the aliasing problem of interval arithmetic.
With RLola, standard sensor error models can be expressed directly in the specification. While the monitoring of RLola specifications
may, in general, require an unbounded amount of memory, we identify a
rich fragment of RLola that can automatically be translated into precise
monitors with guaranteed constant-memory consumption.
Dynamic, Multi-Objective Specification and Falsification of Autonomous CPS
ABSTRACT. Simulation-based falsification has proved to be an effective verification method for cyber-physical systems. Traditional approaches to falsification take as input a single temporal property or a set of properties, all of which must be satisfied by the system at all times. In this paper, we consider falsification of a more complex specification with two dimensions: multiple objectives with relative priorities and the evolution of these objectives characterized by time-varying priorities. We introduce the concept of dynamic rulebooks as a way to specify a prioritized multi-objective specification and its evolution over time. We develop a novel algorithm for falsifying a dynamic rulebook specification on a cyber-physical system. To evaluate our approach, we define scenarios and dynamic rulebook specifications for the domains of autonomous driving and human-robot interaction. Our experiments demonstrate that integrating dynamic rulebooks allows us to capture counterexamples more accurately and efficiently than when using static rulebooks. Moreover, our falsification framework identifies more numerous and significant counterexamples compared to previous approaches.
ABSTRACT. When monitoring a cyber-physical system (CPS) from a remote server, keeping the monitored data secret is crucial, particularly when they contain sensitive information, e.g., biological or location data. Recently, Banno et al. (CAV'22) proposed a protocol for online LTL monitoring that keeps data concealed from the server using Fully Homomorphic Encryption (FHE). We build on this protocol to allow arithmetic operations over encrypted values, e.g., to compute a safety measurement combining distance, velocity, and so forth. Overall, our protocol enables oblivious online monitoring of discrete-time real-valued signals against signal temporal logic (STL) formulas. Our protocol combines two FHE schemes, CKKS and TFHE, leveraging their respective strengths. We employ CKKS to evaluate arithmetic predicates in STL formulas while utilizing TFHE to process them using a DFA derived from the STL formula. We conducted case studies on monitoring blood glucose levels and vehicles' behavior against the Responsibility-Sensitive Safety (RSS) rules. Our results suggest the practical relevance of our protocol.
Sampling-based and Gradient-based Efficient Scenario Generation
ABSTRACT. Safety-critical autonomous systems often operate in highly uncertain environments. These environments consist of many agents, some of which are being designed, and some represent the uncertain aspects of the environment. Testing autonomous systems requires generating diverse scenarios. However, the space of scenarios is very large, and many scenarios do not represent edge cases of the system. We want to develop a framework for automatically generating interesting scenarios. We propose to describe scenarios using a formal language. We show how we can extract interesting scenarios using scenario specifications from sampling-based approaches for scenario generation. We also introduce another technique for edge-case scenario generation using the gradient computation over STL. We demonstrate the capability of our framework in scenario generation in two case studies of autonomous systems involving the autonomous driving domain and the safety of human-robot systems in an industrial manufacturing context.
HyperPart-X: Probabilistic Guarantees for Parameter Mining of Signal Temporal Logic Formulas in Cyber-Physical Systems
ABSTRACT. Optimization-based falsification is an automatic test generation method for evaluating the safety of Cyber-Physical Systems (CPS) against formal requirements. In this work, we focus on temporal logic requirements and, more specifically on Parametric Signal Temporal Logic (pSTL). pSTL generalizes STL allowing to express the quantifiers of the logical operators as variables. This extends the falsification from searching for the inputs that violate the requirement to the inputs and the formula parametrizations that result into unsafe behaviors. The state-of-the-art Part-X, a recent algorithm for falsification, offers probabilistic falsification level sets and confidence-bounded results on inputs but is not tailored for pSTL. Our approach, HyperPart-X, builds on Part-X and solves the problem while also providing probabilistic guarantees on the estimated level set by adaptively branching the parameter space and intelligently sampling from both the parameter and input spaces in a coordinated and hierarchical approach. HyperPart-X is compared on synthetic functions and CPS benchmarks against uniform random and Part-X. Empirical results demonstrate that HyperPart-X returns level set estimates and guarantees, where Part-X fails to find a solution. Results also show that it can achieve the level set with the same associated confidence level as uniform random sampling using orders of magnitude less samples.