OVERLAY 2025: 7TH INTERNATIONAL WORKSHOP ON ARTIFICIAL INTELLIGENCE AND FORMAL VERIFICATION, LOGIC, AUTOMATA, AND SYNTHESIS
PROGRAM FOR SUNDAY, OCTOBER 26TH

View: session overviewtalk overview

09:00-09:15 Session 1: Opening
Chairs:
Angelo Montanari (University of Udine, Italy)
Andrea Orlandini (CNR, Italy)
Nicola Saccomanno (University of Udine, Italy)
Stefano Tonetta (FBK, Italy)
09:15-10:30 Session 2: Verification of learning-based and multi-agent systems
Chair:
Stefano Tonetta (FBK, Italy)
09:15
Asger Horn Brorholt (Aalborg University, Denmark)
Kim Guldstrand Larsen (Aalborg University, Denmark)
Christian Schilling (Aalborg University, Denmark)
Compositional Shielding and Reinforcement Learning for Multi-Agent Systems

ABSTRACT. Deep reinforcement learning has emerged as a powerful tool for obtaining high-performance policies. However, the safety of these policies has been a long-standing issue. One promising paradigm to guarantee safety is a shield, which “shields” a policy from making unsafe actions. However, computing a shield scales exponentially in the number of state variables. This is a particular concern in multi-agent systems with many agents. In this work, we propose a novel approach for multi-agent shielding. We address scalability by computing individual shields for each agent. The challenge is that typical safety specifications are global properties, but the shields of individual agents only ensure local properties. Our key to overcome this challenge is to apply assume-guarantee reasoning. Specifically, we present a sound proof rule that decomposes a (global, complex) safety specification into (local, simple) obligations for the shields of the individual agents. Moreover, we show that applying the shields during reinforcement learning significantly improves the quality of the policies obtained for a given training budget. We demonstrate the effectiveness and scalability of our multi-agent shielding framework in two case studies, reducing the computation time from hours to seconds and achieving fast learning convergence.

09:30
Stefano Calzavara (Università Ca' Foscari Venezia, Italy)
Lorenzo Cazzaro (Università Ca' Foscari Venezia, Italy)
Claudio Lucchese (Università Ca' Foscari Venezia, Italy)
Giulio Ermanno Pibiri (Università Ca' Foscari Venezia, Italy)
Verifiable Boosted Tree Ensembles
PRESENTER: Lorenzo Cazzaro

ABSTRACT. Verifiable learning advocates for training machine learning models amenable to efficient security verification. Prior research demonstrated that a specific class of decision tree ensembles – called large-spread ensembles – allow for robustness verification in polynomial time against any norm-based attacker. This study expands prior work on verifiable learning from basic ensemble methods based on hard majority voting to state-of-the-art boosted tree ensembles, such as those trained using XGBoost or LightGBM. Our formal results indicate that robustness verification is achievable in polynomial time for large-spread boosted ensembles when considering attackers based on the L∞-norm, but remains NP-hard for other norm-based attackers. Nevertheless, we present a pseudo-polynomial time algorithm to verify robustness against attackers based on the Lp-norm for any p ∈ N ∪ {0}, which in practice grants excellent performance and enables verification methods outperforming the state of the art in terms of analysis times. Our experimental evaluation on public datasets shows that large-spread boosted ensembles are accurate enough for practical adoption, while being amenable to efficient security verification. Moreover, our techniques scale to challenging security datasets and their associated security properties proposed in prior work.

09:45
Greta Dolcetti (Ca' Foscari University of Venice, Italy)
Vincenzo Arceri (University of Parma - Department of Mathematical, Physical, and Computer Sciences, Italy)
Agostino Cortesi (Universita' Ca' Foscari di Venezia, Italy)
Enea Zaffanella (University of Parma, Italy)
On the Verification of ML Systems and Models
PRESENTER: Greta Dolcetti

ABSTRACT. The role and impact of machine learning systems and models are growing in every economic and social sector. The problem of guaranteeing the reliability and correctness of the underlying software therefore becomes increasingly relevant. In this article we identify the elements that characterize these systems and that have a challenging impact on the application of state-of-the-art verification techniques and we highlight the advantages and limitations of a set of formal techniques that can be combined to achieve this goal. In principle, we advocate not only for a deeper adoption of formal methods in the machine learning development and deployment, but also for a more systematic and holistic approach.

10:00
Faezeh Labbaf (University of Lugano, Switzerland)
Tomáš Kolárik (University of Lugano, Switzerland)
Martin Blicha (University of Lugano, Switzerland)
Grigory Fedyukovich (Florida State University, United States)
Michael Wand (SUPSI, IDSIA, Switzerland)
Natasha Sharygina (University of Lugano, Switzerland)
Space Explanations of Neural Network Classification
PRESENTER: Faezeh Labbaf

ABSTRACT. We present a novel logic-based concept called Space Explanations for classifying neural networks that gives provable guarantees of the behavior of the network in continuous areas of the input feature space. To automatically generate space explanations, we leverage a range of flexible Craig interpolation algorithms and unsatisfiable core generation. Based on real-life case studies, ranging from small to medium to large size, we demonstrate that the generated explanations are more meaningful than those computed by state-of-the-art.

10:15
Dennis Groß (Simula Research Laboratory, Norway)
Helge Spieker (Simula Research Laboratory, Norway)
Arnaud Gotlieb (Simula Research Laboratory, Norway)
Verifying Memoryless Sequential Decision-making of Large Language Models
PRESENTER: Dennis Groß

ABSTRACT. We introduce the first framework for formal verification of LLM policies in memory‑less sequential decision‑making. Starting from a Markov decision process and a PCTL safety property, our tool (i) encodes each state as a prompt, (ii) parses the LLM’s reply into an action, (iii) incrementally explores only the states reachable under that policy, and (iv) checks the resulting model with the Storm model checker.

11:00-12:30 Session 3: Automata, formal languages and logic
Chair:
Angelo Montanari (University of Udine, Italy)
11:00
Charles Pert (Imperial College London, UK)
Dalal Alrajeh (Imperial College London, UK)
Alessandra Russo (Imperial College London, UK)
RNN Generalization to Omega-Regular Languages
PRESENTER: Charles Pert

ABSTRACT. Büchi automata (BAs) recognize omega-regular languages defined by formal specifications like linear temporal logic (LTL) and are commonly used in the verification of reactive systems. However, BAs face scalability challenges when handling and manipulating complex system behaviors. As neural networks are increasingly used to address these scalability challenges in areas like model checking, investigating their ability to generalize beyond training data becomes necessary. This work presents the first study investigating whether recurrent neural networks (RNNs) can generalize to omega-regular languages derived from LTL formulas. We train RNNs on ultimately periodic omega-word sequences to replicate target BA behavior and evaluate how well they generalize to out-of-distribution sequences. Through experiments on LTL formulas corresponding to automata of varying structural complexity, from 3 to over 100 states, we show that RNNs achieve high accuracy on their target omega-regular languages when evaluated on sequences up to 8 times longer than training examples, with 92.6% of tasks achieving perfect or near-perfect generalization. These results establish the feasibility of neural approaches for learning complex omega-regular languages, suggesting their potential as components in neurosymbolic verification methods.

11:15
Jaouhar Slimi (Université Paris-Saclay, CEA, List, France)
Tristan Le Gall (Université Paris-Saclay, CEA, List, France)
Augustin Lemesle (Université Paris-Saclay, CEA, List, France)
Passive Learning of Lattice Automata from Recurrent Neural Networks
PRESENTER: Jaouhar Slimi

ABSTRACT. We present a passive automata learning algorithm that can extract automata from recurrent networks with very large or even infinite alphabets. Our method combines overapproximations from the field of Abstract Interpretation and passive automata learning from the field of Grammatical Inference. We evaluate our algorithm by first comparing it with the state-of-the-art automata extraction algorithm from Recurrent Neural Networks trained on Tomita grammars. Then, we extend these experiments to regular languages with infinite alphabets, which we propose as a novel benchmark.

11:30
Riccardo Borsetto (University of Verona, Italy)
Margherita Zorzi (University of Verona, Italy)
NAMOR: a New Agda Library for Modal Extended Sequents

ABSTRACT. This paper introduces NAMOR (New Agda MOdal Realization), a comprehensive Agda library for mechanizing modal proof theory through extended sequents. Our approach employs position-based reasoning that mirrors first-order logic structures while maintaining implicit accessibility relations. Building on our previous investigation of extended deductive systems, NAMOR provides a unified framework for formalizing proof-theoretical properties across normal extensions of modal logic K, from basic K through S5. The library is founded on our novel sequent calculus $E_{pos}$, which exhibits remarkable modularity: all captured logics share identical inference rules, differing mainly in their modal constraints. This enables systematic verification of meta-theoretical properties while preserving the intuitive structure of paper-and-pencil proofs. By leveraging Agda’s expressive type system, NAMOR establishes a direct correspondence between formal proofs and their mathematical counterparts, creating a robust platform for symbolic reasoning in domains where modal logic naturally applies.

11:45
José João Ferreira (INESC-ID / Instituto Superior Técnico, University of Lisbon, Portugal, Portugal)
Nuno Policarpo (INESC-ID / Instituto Superior Técnico, University of Lisbon, Portugal, Portugal)
José Fragoso Santos (INESC-ID / Instituto Superior Técnico, University of Lisbon, Portugal, Portugal)
Alcino Cunha (INESC TEC / University of Minho, Portugal, Portugal)
Alessandro Gianola (INESC-ID / Instituto Superior Técnico, University of Lisbon, Portugal, Portugal)
First-Order Linear Temporal Logic for Testing Distributed Protocols

ABSTRACT. Distributed systems depend on correct protocol implementations, but testing them remains complex due to concurrency and data-dependent behaviors. We explore the use of First-Order Linear Temporal Logic (FOLTL) for both offline and online monitoring, enabling precise specification of system properties over execution traces. While, in general, FOLTL is undecidable, recent advances in automated reasoning tools make reasoning in practical fragments feasible, opening new directions for validating real-world distributed systems. We present example specifications of real-world distributed protocols and outline research avenues that could facilitate the use of FOLTL fragments in testing such systems.

12:00
Luca Geatti (University of Udine, Italy)
Automata Cascades for Model Checking

ABSTRACT. The cascade product is a method for combining two automata in which the first automaton reads symbols from an alphabet Σ, while the second automaton reads symbols from the Cartesian product of Σ and the set of states of the first automaton. This capability of the second automaton to process not only input symbols but also the current state of the first automaton has made the cascade product a fundamental tool in theoretical computer science, with the most notable result being the Krohn–Rhodes Cascade Decomposition Theorem. In this paper, we propose the use of the cascade product in the context of formal verification, specifically for safety model checking. We show that specifying the undesired behaviors of a system through a cascade of automata enables an approach to model checking that is incremental with respect to the specification, where verification proceeds component by component, potentially reusing information obtained from the verification of previous steps. We also demonstrate how to represent a cascade of automata symbolically, thereby making it possible to perform symbolic verification of such properties. A proof-of-concept on a set of simple models, evaluated experimentally, indicates that this approach is promising.

12:15
Hazem Dewidar (Sapienza University of Rome, Italy)
Elena Umili (Sapienza University of Rome, Italy)
Fully Learnable Neural Reward Machines
PRESENTER: Elena Umili

ABSTRACT. Non-Markovian Reinforcement Learning (RL) tasks present significant challenges, as agents must reason over entire trajectories of state-action pairs to make optimal decisions. A common strategy to address this is through symbolic formalisms, such as Linear Temporal Logic (LTL) or automata, which provide a structured way to express temporally extended objectives. However, these approaches often rely on restrictive assumptions—such as the availability of a predefined Symbol Grounding (SG) function mapping raw observations to high-level symbolic representations, or prior knowledge of the temporal task. In this work, we propose a fully learnable version of Neural Reward Machines (NRM), which can learn both the SG function and the automaton end-to-end, removing any reliance on prior knowledge. Our approach is therefore as easily applicable as classic deep RL (DRL) approaches, while being far more explainable, because of the finite and compact nature of automata. Furthermore we show that by integrating Fully Learnable Reward Machines (FLNRM) with DRL outperforms previous approaches based on Recurrent Neural Networks (RNNs).

14:00-14:45 Session 4: Applications of formal verification and AI
Chair:
Nicola Saccomanno (University of Udine, Italy)
14:00
Ishan Saxena (German Aerospace Center (DLR) e.V., Germany)
Dominik Grundt (German Aerospace Center (DLR) e.V., Germany)
Eike Möhlmann (German Aerospace Center (DLR) e.V., Germany)
Bernd Westphal (German Aerospace Center (DLR) e.V., Germany)
Towards Runtime Detection of Novel Traffic Situations
PRESENTER: Ishan Saxena

ABSTRACT. Automated Vehicles developers need to define an Operational Design Domain (ODD) where such vehicles can operate safely. In order to extend the defined ODDs, the developers base their decision after detailed analysis of recorded data from multiple data collection drives. For the acquired data, it is important to know whether it is known traffic situation information (inside the automated vehicle’s ODD) or novel information that can be used to expand the ODD. The large amount of data that is generated by a modern vehicle’s sensors makes data storage and efficient analysis for expanding ODDs hardly feasible (most of the current approaches record all sensor data and then post-process the data using AI-based methods and finally perform manual checks in order to find the novel data). Hence, there is a need to classify traffic situations as novel at system runtime for an appropriately abstract notion of novelty so that the conceptually same traffic situation, e.g. on two similar days, is not considered novel only because of the different date. We propose a new methodology for detection of novel traffic situations at system runtime. The methodology is based on a traffic catalogue that consists of abstract traffic situation descriptions, which are a formalized representation of sets of concrete traffic situations. Continuous, automatic checks for satisfaction of the current traffic situation against the traffic catalogue provides verdicts about the novelty of the current traffic situation. Using an illustrative example, we show how domain experts can utilize the detected novelties to create such a traffic catalogue such that the novelties are classified as known in the future. The proposed method doesn’t require any pre-training of an AI-based classifier and is human understandable, explainable and traceable.

14:15
Lorenzo Balboni (University of Ferrara, Italy)
Federico Manzella (University of Ferrara, Italy)
Guido Sciavicco (Universitá di Ferrara, Italy)
Let the Music Flow Where the Modal Branches Lead
PRESENTER: Lorenzo Balboni

ABSTRACT. This paper presents a novel approach to symbolic music generation using Modal Decision Trees (MDTs), a class of interpretable, rule-based models that integrate modal logic into the decision-tree algorithm. We construct a dataset from multitrack MIDI files, representing each track as a binary pianoroll matrix, and frame the generation task as a binary classification problem: predicting whether a pitch should be played at each time step based on a local window of context. The performed experiments demonstrate that MDTs, despite not leveraging explicit pitch identities, achieve strong performance, with an F1 score of 0.926 and balanced accuracy of 0.943. The generated music closely follows the input melodies, with some modifications that reflect the model’s learned patterns. While MDTs do not yet match the performance of deep learning models for temporal sequence modeling, our results highlight their potential as interpretable tools for symbolic music generation. We discuss future directions, including modal decision forests, multi-track analysis, and controlled randomness to enhance generative diversity.

14:30
Kristina Gogoladze (Universiteit Utrecht, Netherlands)
Natasha Alechina (Open Universiteit, Netherlands)
Romy van Jaarsveld (UMC Utrecht, Netherlands)
Ronald De Jong (Technical University Eindhoven, Netherlands)
Yasmina Al Khalil (Technical University Eindhoven, Netherlands)
Gino Kuiper (University Medical Centre Utrecht, Netherlands)
Brian Logan (Utrecht University, Netherlands)
Jelle P. Ruurda (University Medical Centre Utrecht, Netherlands)
Run-time verification of robot-assisted surgery using visual input

ABSTRACT. We present a preliminary study of the feasibility of using run-time verification to provide intraoperative warnings during robot-assisted surgery for cancer. Robot-assisted surgery offers several advantages over traditional surgery; however, it can be challenging for surgeons, particularly if they lack experience in the procedure being performed. We report the results of a study to elicit from surgeons which clinically relevant properties should be monitored, and present a proof-of-concept evaluation of the feasibility of monitoring using the video image available to the surgeon as input, which suggests that real-time monitoring for property violations is possible.

14:45-15:30 Session 5: LLMs and Neurosymbolic AI for knowledge acquisition
Chair:
Andrea Orlandini (CNR, Italy)
14:45
Arshad Beg (Maynooth University, Ireland)
Diarmuid O'Donoghue (Maynooth University, Ireland)
Rosemary Monahan (Maynooth University, Ireland)
Leveraging LLMs for Formal Software Requirements: Challenges and Prospects
PRESENTER: Arshad Beg

ABSTRACT. Software correctness is ensured mathematically through formal verification, which involves the resources of generating formal requirement specifications and having an implementation that must be verified. Tools such as model-checkers and theorem provers ensure software correctness by verifying the implementation against the specification. Formal methods deployment is regularly enforced in the development of safety-critical systems e.g. aerospace, medical devices and autonomous systems. Generating these specifications from informal and ambiguous natural language requirements remains the key challenge. Our project, VERIFYAI1, aims to investigate automated and semi-automated approaches to bridge this gap, using techniques from Natural Language Processing (NLP), ontology-based domain modelling, artefact reuse, and large language models (LLMs). This position paper presents a preliminary synthesis of relevant literature to identify recurring challenges and prospective research directions in the generation of verifiable specifications from informal requirements.

15:00
Lucas Fortunato Das Neves (INESC-ID/Instituto Superior Técnico, Universidade de Lisboa, Portugal)
Chrysoula Zerva (Instituto de Telecomunicações/Instituto Superior Técnico, Universidade de Lisboa, Portugal)
Alessandro Gianola (INESC-ID/Instituto Superior Técnico, Universidade de Lisboa, Portugal)
A Proposal For Handling Query Ambiguity For Process Mining Tasks

ABSTRACT. Process mining plays a critical role in extracting insights from organizational processes by analyzing event log data. However, it requires the engagement of domain experts and process analysts. While Large Language Models have enabled conversational agents for process mining tasks — reducing dependence on analysts — their adoption introduces new challenges. Users without an understanding of process mining often formulate ambiguous or ill-defined queries due to limited specific knowledge, hindering accurate analysis. To address this, we propose multiple approaches that can be further studied to reduce the ambiguity by relying on AI techniques like Retrieval-Augmented Generation and Chain-of-Thought. By reducing ambiguity, human interaction with conversational agents becomes more intuitive, further bridging the existing gap. We also introduce datasets Query-PM-LLM and AmbQuery-PM-LLM, which can be used as benchmarks for future conversational agents capable of solving process mining tasks.

15:15
Celeste Veronese (University of Verona, Italy)
Daniele Meli (University of Verona, Italy)
Alessandro Farinelli (Computer Science Department, Verona University, Italy)
Scaling DRL Training via Symbolic Knowledge Transfer
PRESENTER: Celeste Veronese

ABSTRACT. We propose a neuro-symbolic DRL approach that integrates background symbolic knowledge to improve sample efficiency and generalization to more challenging, unseen tasks. Partial logical policies defined for simple domain instances, where high performance is easily attained, are transferred as useful priors to perform online reasoning and accelerate learning in more complex settings through two mechanisms: (i) biasing the action distribution during exploration, and (ii) rescaling Q-values during exploitation. This neuro-symbolic integration enhances interpretability and trustworthiness while accelerating convergence, particularly in sparse-reward environments and tasks with long planning horizons, while maintaining the same optimality guarantees of the original DRL algorithm. Moreover, it enables generalization to new training settings without re-tuning the underlying DRL algorithm. We empirically validate our methodology on challenging variants of gridworld environments, both in the fully observable and partially observable settings, showing improved performance over a state-of-the-art reward-shaping baseline, especially when the symbolic knowledge is imperfect or incomplete.

16:00-16:30 Session 6: Games and reasoning
Chair:
Andrea Orlandini (CNR, Italy)
16:00
Shenghui Chen (University of Texas at Austin, United States)
Shufang Zhu (University of Liverpool, UK)
Giuseppe De Giacomo (University of Oxford, UK)
Ufuk Topcu (University of Texas at Austin, United States)
Learning to Coordinate without Communication under Incomplete Information

ABSTRACT. Achieving seamless coordination in cooperative games is a crucial challenge in artificial intelligence, particularly when players operate under incomplete information. While communication helps, it is not always feasible. In this paper, we explore how effective coordination can be achieved without verbal communication, relying solely on observing each other's actions. Our method enables an agent to develop a strategy by interpreting its partner’s action sequences as intent signals, constructing a finite-state transducer built from deterministic finite automata, one for each possible action the agent can take. Experiments show that these strategies significantly outperform uncoordinated ones and closely match the performance of coordinating via direct communication.

16:15
Giuseppe De Giacomo (University of Oxford & Sapienza Univ. Roma, UK)
Yves Lespérance (York University, Canada)
Matteo Mancanelli (University of Rome La Sapienza, Italy)
Strategic Reasoning over Golog Programs in the Nondeterministic Situation Calculus - Extended Abstract

ABSTRACT. Automata-based synthesis has seen increasing interest in the last decade, mostly focused on declarative specifications. Here, we consider procedural programs that specify high-level agent behaviors over nondeterministic domains. Specifically, we tackle the problem of synthesizing strategies that guarantee the successful execution of a Golog program δ within a nondeterministic basic action theory (NDBAT). Our approach constructs symbolic program graphs that capture the control flow of δ independently of the domain, enabling reactive synthesis via their cross product with the domain model. We formally relate graph-based transitions to standard Golog semantics and show how our framework modularly separates agent and environment behaviors. This flexibility supports strategic reasoning and synthesis in complex nondeterministic settings, with programs acting as independent controllers for both agent and environment.

16:30-17:00 Session 7: Spotlight presentations of Work in Progress
Chair:
Nicola Saccomanno (University of Udine, Italy)
16:30
Melissa Antonelli (University of Helsinki and HIIT, Finland)
Arnaud Durand (Université Paris Cité, France)
Juha Kontinen (University of Helsinki, Finland)
Circuit Complexity Meets Discrete Ordinary Differential Equations: An Overview

ABSTRACT. Boolean and arithmetic circuits are receiving increasing attention in AI. In particular, due to the connection with neural networks, the evaluation of their computational resources, as studied in circuit complexity, has been thoroughly investigated to address concrete issues, such as scalability. In this paper, we outline an original approach to explore this area using recursion schemas defined by discrete ordinary differential equations (ODEs). Relying on them, we provide new characterizations of various small circuit classes, offering a uniform and comprehensive framework to potentially unveil the relationships between them and clarify notions like counting in this context. Here, we present a systematic and integrated overview of the main results obtained thus far, namely the introduction of algebras characterized by different constraints on the linearity defining their ODE schemas and capable of capturing function classes that range from those computed by unbounded fan-in circuits working in constant depth (FAC0) to bounded Boolean circuits working in logarithmic depth (FNC1), including those with counting (FTC0) and counting modulo 2 gates (FACC[2]). This would provide a starting point for discussing the various ongoing directions of this research and the potential challenges associated with this new perspective on the study of circuit complexity.

16:33
Cosimo Perini Brogi (IMT School for Advanced Studies Lucca, Italy)
From Applicative Programming to Verification-based Knowledge: A Curry-Howard-Lambek Reading

ABSTRACT. We discuss a modular natural deduction framework for intuitionistic modal logics dealing with applicative functors and verificationist knowledge, extending a calculus for belief/applicative programming (IEL⁻) to one (IEL) for knowledge in mathematics and other contexts as well. Through the proofs-as-programs paradigm, we map these systems into modal type theories and prove strong normalisation via CPS translations, uniformly handling detours and permutations. Finally, we give a categorical interpretation of the modal operator as a monoidal functor, reflecting the rewrite structure of the calculi.

16:36
Raik Dankworth (University of Lübeck, Germany)
Gesina Schwalbe (University of Lübeck, Germany)
Attack logics, not outputs: Towards efficient robustification of deep neural networks by falsifying concept-based properties
PRESENTER: Raik Dankworth

ABSTRACT. Deep neural networks (NNs) for computer vision are vulnerable to adversarial attacks, i.e., miniscule malicious changes to inputs may induce unintuitive outputs. One key approach to verify and mitigate such robustness issues is to falsify expected output behavior. This allows, e.g., to locally proof security, or to (re)train NNs on obtained adversarial input examples. Due to the black-box nature of NNs, current attacks only falsify a class of the final output, such as flipping from stop_sign to ¬stop_sign. In this paper we generalize this to search for generally illogical behavior: falsify constraints (concept-based properties) involving further human-interpretable concepts, like red ∧octogonal → stop_sign. For this, an easy implementation of concept-based properties on already trained DNNs is proposed using techniques from explainable artificial intelligence. Further, we sketch the theoretical proof that attacks on concept-based properties are expected to have a reduced search space compared to simple class falsification, whilst arguably be more aligned with intuitive robustness targets. As an outlook we hypothesize that this approach has potential to efficiently and simultaneously improve logical compliance and robustness.

16:39
Ana Maria Gomez-Ruiz (Verimag, Université Grenoble Alpes, France)
Alexandre Donzé (Verimag, Université Grenoble Alpes, France)
Thao Dang (CNRS/VERIMAG, France)
RLROM: Monitoring and Training Reinforcement Learning Agents using Signal Temporal Logic
PRESENTER: Alexandre Donzé

ABSTRACT. Reinforcement learning (RL) excels at solving complex tasks, but its application presents challenges in interpretability, safety and reliability. This is in part due to the dynamic nature of the RL problems, for which timing interactions are crucial. RLROM is a novel tool that integrates Signal Temporal Logic (STL) with common RL frameworks. STL, a formal language for time-dependent properties, enables RLROM to monitor agent behavior against specified requirements, identifying both intended and unexpected behaviors. STL properties can be robustly monitored online, generating quantitative satisfaction values that can serve as new observations or reward components during training. This enhances RL problem design, interpretability, and the development of safe, reliable, and efficient agents. We demonstrate RLROM's utility using the highway-env environment, showing its ability to influence driving agent behavior, such as speed, safety and rule adherence.

16:42
Leonardo Ceragioli (Università degli Studi di Milano (Statale), Italy)
Giuseppe Primiero (Università degli Studi di Milano (Statale), Italy)
A Proof System with Causal Labels (Part I): checking Individual Fairness and Intersectionality

ABSTRACT. In this article we propose an extension to the typed natural deduction calculus TNDPQ to model verification of individual fairness and intersectionality in probabilistic classifiers. Their interpretation is obtained by formulating specific conditions for the application of the structural rule of Weakening. Such restrictions are given by causal labels used to check for conditional independence between protected and target variables.

16:45
Leonardo Ceragioli (Università di Milano (Statale), Italy)
Giuseppe Primiero (University of Milan, Italy)
A Proof System with Causal Labels (Part II): checking Counterfactual Fairness

ABSTRACT. In this article we propose an extension to the typed natural deduction calculus \textbf{TNDPQ} to model verification of counterfactual fairness in probabilistic classifiers. This is obtained formulating specific structural conditions for causal labels and checking that evaluation is robust under their variation.

16:48
Mohammad Afzal (TCS Research Pune and IIT Bombay India, India)
Ashutosh Gupta (IIT Bombay, India)
R Venkatesh (TCS Research Pune India, India)
S. Akshay (IIT Bombay, India)
False Positives in Robustness Checking of Neural Networks
PRESENTER: Mohammad Afzal

ABSTRACT. Neural networks are increasingly used in safety-critical systems. To ensure their trustworthy deployment, we need to verify that they are robust against minor perturbations. But in doing so, the counterexamples found by state-of-the-art neural network verifiers for robustness may not be really meaningful. In fact, a counterexample that is considered a robustness violation reported by a neural network may not be a violation in the view of a domain expert. We refer to such cases as false positives (FP). In this work, we propose a new approach to evaluate the robustness of neural networks by considering the view of domain expert. Our goal is to evaluate the presence or absence of such FP in state of the art verifiers. However, doing this manually indeed may not scale. Thus, in our experiments, we evaluate the local robustness property based on the notions of FP and TP (True Positives), while approximating the domain expert using an ensemble of state-of-the-art neural network classifiers.

16:51
Alessandro Gianola (INESC-ID/Instituto Superior Técnico, Universidade de Lisboa, Portugal)
Chrysoula Zerva (Instituto de Telecomunicações (IT), IST, Lisbon, Portugal)
Compliance Checking for Public Administration Processes using Retrieval-Augmented Generation in LLMs: Novel Directions and Challenges

ABSTRACT. Public administration processes operate within complex environments, where they need to comply with evolving legal norms. Nevertheless, their specifications and guidelines documentation are often unstructured and hard to analyze automatically. This position paper proposes a novel hybrid approach that combines Large Language Models, Retrieval-Augmented Generation, and symbolic reasoning to align process specifications with legal guidelines. We outline key research questions and introduce a pipeline for extracting, structuring, and aligning business-related, textual content expressed in natural language with regulatory guidelines, to support automated compliance checking in public administration contexts.

16:54
Yilian Huang (IMT School for Advanced Studies Lucca, Italy)
Cosimo Perini Brogi (IMT School for Advanced Studies Lucca, Italy)
Rocco De Nicola (Institute for Informatics and Telematics - CNR Pisa, Italy, Italy)
Reasoning on Privacy Policies
PRESENTER: Yilian Huang

ABSTRACT. Privacy policies stated in natural language are often ambiguous and not amenable to direct machine processing. Controlled Natural Languages (CNLs) provide a simplified, structured way to express policy rules that is both easier for natural language processing (NLP) to parse and closer to formal representations used in automated reasoning (AR). By translating further CNL statements into logical systems, we can connect NLP with rigorous AR to verify privacy policies and related regulations. This paper proposes a three‑layer methodology -- based on NLP, CNL, and AR -- for the formal verification of privacy policies, and describes our ongoing development of this approach.

16:57
Sandamali Yashodhara Wickramasinghe (City St George's, University of London, UK)
Jacob M. Howe (City St George's, University of London, UK)
Laure Daviaud (The University of East Anglia, UK)
Extracting Weighted Finite Automata from RNNs via iterative partitioning and spectral learning

ABSTRACT. This paper proposes an algorithm to extract a Weighted Finite Automaton (WFA) from a Recurrent Neural Network (RNN), by adapting the L* algorithm in combination with spectral learning using Singular Value Decomposition (SVD). The method introduces a discrete abstraction of the RNN’s hidden state space to enable approximate equivalence queries within the L* framework. Through these queries, counterexamples are identified and used to iteratively refine the prefix and suffix sets that define the Hankel matrix, thereby improving the accuracy of successive WFA hypotheses. This work lays the groundwork for integrating discrete clustering with spectral learning in the context of WFA extraction from RNNs—an area that remains underexplored—with potential applications in sequence modelling and analysis. Future work will focus on implementing and empirically validating the proposed approach.

17:45-17:50 Session 9: Closing remarks
Chairs:
Angelo Montanari (University of Udine, Italy)
Andrea Orlandini (CNR, Italy)
Nicola Saccomanno (University of Udine, Italy)
Stefano Tonetta (FBK, Italy)