View: session overviewtalk overviewside by side with other conferences

09:00-10:30 Session 83H: Markov decision processes 1
Conditioning and quantiles in Markovian models

ABSTRACT. The classical multi-objective analysis of Markov decision processes (MDP) and other stochastic models with nondeterminism seeks for policies to resolve the nondeterministic choices that achieve Pareto-optimality or maximize/minimize a linear goal function, while satisfying a list of probability or expectation constraints. Other approaches to reason about the trade-off between multiple cost and reward functions rely on conditioning or quantiles. The former, for instance, can serve to synthesize policies that maximize the expected utility in energy-critical situations or to minimize the cost of repair mechanism in exceptional error scenarios. Quantiles can, e.g., be used to identify policies with a minimal cost bound for meeting a utility constraint with sufficiently high probability. While algorithms for conditional probabilities in MDPs can rely on a fairly simple model transformation, reasoning about conditional expected accumulated costs and quantiles is computationally more difficult. The talk presents an overview of recent results in these directions.

Multiple Objectives and Cost Bounds in MDP

ABSTRACT. We summarize our recent TACAS paper, where we provide an efficient algorithm for multi-objective model-checking problems on Markov decision processes (MDPs) with multiple cost structures. The key problem at hand is to check whether there exists a scheduler for a given MDP such that all objectives over cost vectors are fulfilled. Reachability and expected cost objectives are covered and can be mixed. Empirical evaluation shows the algorithm’s scalability. We discuss the need for output beyond Pareto curves and exploit the available information from the algorithm to support decision makers.

10:30-11:00Coffee Break
11:00-12:30 Session 86I: Games and synthesis
Constraint Problem for Weak Subgame Perfect Equilibria with omega-regular Boolean Objectives

ABSTRACT. We study n-player turn-based games played on a finite directed graph. For each play, each player receives a gain that he wants to maximise. Instead of the well-known notions of Nash equilibrium (NE) and subgame perfect equilibrium (SPE), we focus on the recent notion of weak subgame perfect equilibrium (weak SPE), a refinement of SPE. In this setting, players who deviate cannot use the full class of strategies but only a subclass with a finite number of deviation steps. We are interested in the constraint problem: given an upper and a lower thresholds x, y in {0,1}^n, we want to determine if there exists a weak SPE such the gain of each player is componentwise between the two bounds. The goal of our ongoing research is to characterise the complexity of the constraint problem for the class of games with Boolean omega-regular objectives such that Muller, Büchi, Reachability ...

Parameterized complexity of games with monotonically ordered omega-regular objectives

ABSTRACT. In recent years, two-player zero-sum games with multiple objectives have received a lot of interest as a model for the synthesis of complex reactive systems. In this framework, Player 1 wins if he can ensure that all objectives are satisfied against any behavior of Player 2. When this is not possible to satisfy all the objectives at once, an alternative is to use some preorder on the objectives according to which subset of objectives Player 1 wants to satisfy. For example, it is often natural to provide more significance to one objective over another, a situation that can be modelled with lexicographically ordered objectives for instance. Inspired by recent work on concurrent games with multiple omega-regular objectives by Bouyer et al., we investigate in detail turned-based games with monotonically ordered and omega-regular objectives. We study the threshold problem which asks whether Player 1 can ensure a payoff greater than or equal to a given threshold w.r.t. a given monotonic preorder. As the number of objectives is usually much smaller than the size of the game graph, we provide a parametric complexity analysis and we show that our threshold problem is fixed parameter tractable for all monotonic preorders and all classical types of omega-regular objectives. We also provide polynomial time algorithms for Buchi, coBuchi and explicit Muller objectives for a large subclass of monotonic preorders that includes among others the lexicographic preorder. In the particular case of lexicographic preorder, we also study the complexity of computing the values and the memory requirements of optimal strategies.

Comfort and Energy Optimization for Floor Heating Systems
SPEAKER: Marco Muniz

ABSTRACT. Controller synthesis for stochastic hybrid switched systems, like e.g. a floor heating system in a house, is a complex computational task that cannot be solved by an exhaustive search though all the control options. The state-space to be explored is in general uncountable due to the presence of continuous variables (e.g. temperature readings in the different rooms) and even after discretization, the state-space remains huge and cannot be fully explored.

In previous work, we proposed an on-line synthesis methodology, where we periodically compute the controller only for the near future based on the current sensor readings. This computation is itself done by employing machine learning in the tool Uppaal Stratego. We focused on optimizing a single objective, the comfort of the users. Experiments have shown that our approach can lead to enormous improvement over the current controller. Our approach is now being implemented in a real house in Aalborg, Denmark.

In this work we focus on a more ambitious multi-objective problem, optimize the comfort of the users while reducing energy consumption. We apply our on-line synthesis methodology in this setting. We show the new challenges that arise, and propose alternatives to address these challenges.

12:30-14:00Lunch Break
14:00-15:30 Session 87I: Timed and hybrid systems
A journey through negatively-weighted timed games: undecidability, decidability, approximability

ABSTRACT. Weighted timed games are zero-sum games played by two players on a timed automaton equipped with weights, where one player wants to minimise the accumulated weight while reaching a target. Used in a reactive synthesis perspective, this quantitative extension of timed games allows one to measure the quality of controllers in real-time systems. Weighted timed games are notoriously difficult and quickly undecidable, even when restricted to non-negative weights. However, for a few years now, we explored, and we continue to explore, the world of weighted timed games with negative weights too, in order to get a more useful modelling mechanism. This gave rise to stronger undecidability results, but we also discovered new decidable fragments. In this talk, I will survey these results: decidability when limiting the number of distinct weights in the game, using corner-point abstraction techniques; decidability for a large fragment of one-clock weighted timed games, and for the so-called divergent weighted timed games, using value iteration schemes; approximability in the case of the so-called almost-divergent weighted timed games.

Stochastic o-minimal hybrid systems

ABSTRACT. Timed automata and hybrid systems are important frameworks for the analysis of continuous-time systems. In this ongoing work, we study a stochastic extension of the latter. There are clear challenges regarding decidability: (i) the reachability problem is already quickly undecidable for non-stochastic hybrid systems; (ii) even in the simpler setting of timed automata, the finite abstraction known as the region graph does not preserve its correctness when lifting it to the stochastic setting. Our goal is to define the class of stochastic o-minimal hybrid systems (SoHSs) and to show that it ensures good properties while being reasonably rich (e.g., it encompasses continuous-time Markov chains). We look for decisiveness of SoHSs (as introduced by Adbulla et al.) and definability in our o-minimal structure. Hopefully, approximation of reachability probabilities could be obtainable in appropriate (and still quite rich) structures.

15:30-16:00Coffee Break
16:00-18:00 Session 88G: Markov decision processes 2
Multi-Scenario Uncertainty in Markov Decision Processes

ABSTRACT. Markov decision processes (MDPs) are a well-known formalism for mod- eling discrete event systems. Several extensions of the model have been proposed in order to encompass additional model-level uncertainty in MDPs. Here, we introduce a multi-scenario uncertainty model which has been proposed in the author’s PhD thesis and possible future research.

Safe and Optimal Scheduling of Hard and Soft Tasks

ABSTRACT. We consider a stochastic scheduling problem with both hard and soft tasks. Each task is described by a discrete probability distribution over possible execution times, a deadline and a distribution over inter-arrival times. Our scheduling problem is non-clairvoyant in the sense that the execution and inter-arrival times are subject to uncertainty modeled by stochastic distributions. Given an instance of our problem, we want to synthesize a schedule that is safe and efficient: safety imposes that deadline of hard tasks are never violated while efficient means that soft tasks should be completed as often as possible. To enforce that property, we impose a cost when a soft task deadline is violated and we are looking for a schedule that minimizes the expected value of the limit average of this cost.

First, we show that the dynamics of such a system can be modeled as a finite Markov Decision Process (MDP). Second, we show that the problem of deciding the existence of a safe and efficient scheduler is PP-hard and in EXPTIME. Third, we have implemented our synthesis algorithm that partly relies on the probabilistic model-checker Storm to analyze the underlying MDP. Given a set of both hard and soft tasks, we first compute the set of safe schedules, i.e., the ones in which the hard tasks always finish their execution, then we compute the scheduler that minimizes the limit average cost among the set of all safe schedules. Finally, we compare the optimal solutions obtained with our procedure against an adaptation of the earliest deadline first (EDF) algorithm to account for the soft task. We show that this EDF strategy can be arbitrarily worse in terms of the expected limit average cost for missing the deadlines of the soft tasks when compared to our optimal procedure.

Logically-Constrained Reinforcement Learning

ABSTRACT. We propose a Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Buchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, our algorithm synthesizes a policy that satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.

19:00-21:30 Workshops dinner at Keble College

Workshops dinner at Keble College. Drinks reception from 7pm, to be seated by 7:30 (pre-booking via FLoC registration system required; guests welcome).

Location: Keble College