View: session overviewtalk overview
Welcome from Prof. Sara Lombardo, Head of School of Mathematical and Computer Sciences, Heriot-Watt University
Abstract: In this talk, we explore the transformative potential of custom reinforcement learning (RL) algorithms in accelerating solutions to complex, research-level mathematical challenges. We begin by illustrating how these algorithms have achieved a 10X improvement in areas where previous advances of the same magnitude required many decades. A comparative analysis of different network architectures is presented to highlight their performance in this context. We then delve into the application of RL algorithms to exceptionally demanding tasks, such as those posed by the Millennium Prize problems and the smooth Poincaré conjecture in four dimensions. Drawing on our experiences, we discuss the prerequisites for developing new RL algorithms and architectures that are tailored to these high-level challenges. This talk is largely based on recent work with A.Shehper, A.Medina-Mardones, L.Fagan, B.Lewandowski, A.Gruen, Y.Qiu, P.Kucharski, and Z.Wang.
Bio: After receiving his PhD from Princeton University, Sergei Gukov spent five years at Harvard University as a research fellow of the Clay Mathematics Institute and two years at the school of mathematics at the Institute for Advanced Studies. His passion is building new bridges between different areas of mathematical physics and pure mathematics, such as quantum topology, mirror symmetry, and gauge theory. His more recent interests involve new connections between mathematics and machine learning.
Contributed Talks devoted to general methodological issues in applications of machine learning in proof finding and proof generation
10:40 | Autoformalization and autoinformalization via probabilistic reasoning ABSTRACT. Recent advances in the control of large language models via probabilistic programming allow us to imagine new workflows in theorem proving and formal mathematics. In order to spark a broader conversation about the role of probabilistic reasoning, we propose several speculative approaches: a reweighted wake-sleep algorithm for improving model quality, an involutive approach to autoformalization and autoinformalization, and a coarse-to-fine approach to proof synthesis. |
11:10 | Frontier of Formal Theorem Proving with Large Language Models: Insights from the DeepSeek-Prover ABSTRACT. Recent advances in large language models have markedly influenced mathematical reasoning and automated theorem proving within artificial intelligence. Yet, despite their success in natural language tasks, these models face notable obstacles in formal theorem proving environments such as Lean and Isabelle, where exacting derivations must adhere to strict formal specifications. Even state-of-the-art models encounter difficulty generating accurate and complex formal proofs, revealing the unique blend of mathematical rigor required in this domain. In the DeepSeek-Prover series (V1 and V1.5), we have explored specialized methodologies aimed at addressing these challenges. This talk will delve into three foundational areas: the synthesis of training data through autoformalization, reinforcement learning that utilizes feedback from proof assistants, and test-time optimization using Monte Carlo tree search. I will also provide insights into current model capabilities, persistent challenges, and the future potential of large language models in automated theorem proving. |
11:40 | Intrinsic Motivation To Construct Terms In A Dependent Type Theory PRESENTER: Guy Axelrod ABSTRACT. See attached file. |
12:00 | Structure-Aware Neural Representations of Agda Programs PRESENTER: Konstantinos Kogkalidis ABSTRACT. We introduce an ML-ready dataset for dependently-typed program-proofs in the Agda proof assistant, and present the underlying extraction tool. We then demonstrate how a bidirectional Transformer encoder retrofitted with structure-specific adjustments can achieve high performance in a realistic premise selection setup. Unlike most go-to LLM architectures, both the dataset and the neural model work in tandem to maximally preserve the typing information offered by Agda. The resulting pipeline is agnostic and impervious to most syntactic confounds, producing its inferences on the basis of type-structure alone. We provide empirical evidence that this makes for a general, robust, and efficient modeling approach. |
12:20 | Abstraction Logic Is All You Need ABSTRACT. Abstraction logic is a minimal yet maximally expressive logical system that serves both as a logic and a logical framework. While existing logics often trade simplicity for expressiveness, abstraction logic achieves both, making it an ideal foundation for machine-learning approaches to theorem proving. Its simplicity enables ML systems to manipulate and adapt it easily, while its expressiveness ensures no practical limitations. We present the core principles of abstraction logic and discuss its unique characteristics that make it well suited for AITP (artificial intelligence theorem proving). |
The Generative AI Laboratory (GAIL) at the University of Edinburgh is a centre for excellence dedicated to researching all aspects of generative artificial intelligence (AI) in society. Uniting the diverse research expertise across the University with generative AI at its core, GAIL taps into a thriving AI landscape with recognised strengths in natural language processing, machine learning, and data-driven innovation.
This interdisciplinary initiative focuses on advancing generative AI techniques in the key areas of future health and care, climate and sustainability, and economic growth, and is committed to innovating in generative AI through partnerships with global stakeholders, driving new collaborations and dynamic projects across government, industry, and the public sector.
To learn more about GAIL, see here: https://gail.ed.ac.uk/
Contributed Talks devoted to Large Language Models for translating betwen different provers, proof formats and mathematical proofs written in natural languages
13:30 | MiniF2F in Rocq: Automatic Translation Between Proof Assistants — A Case Study PRESENTER: Jules Viennot ABSTRACT. In this work, we conduct an experiment using state-of-the-art LLMs to translate MiniF2F into Rocq. The translation task focuses on generating a Rocq theorem based on three sources: a natural language description, the Lean formalization, and the Isabelle formalization. We conducted our experiment in 3 stages of increasing complexity, from basic one-shot prompting to multi-turn conversations that incorporate feedback from unsuccessful attempts. At each stage, we perform multiple rounds of translation using increasingly advanced models: GPT-4o mini, Claude 3.5 Sonnet, o1 mini, and o1. We successfully translated 478 out of 488 theorems. |
14:00 | NLIR: Natural Language Intermediate Representation for Mechanized Theorem Proving PRESENTER: Guillaume Baudart ABSTRACT. Thanks to recent advances in LLM capabilities, we believe natural language can serve as a universal interface for reasoning about formal proofs. In this work, we develop an LLM-based agent that can interact with the Rocq proof assistant. We present the following contributions: 1) Pétanque: A new fast and lightweight environment to interact with the Rocq theorem prover. 2) An interactive proof protocol leveraging natural language reasoning: hierarchical proof templating. 3) A search algorithms leveraging feedback from the ITP and natural language to rerank proof candidates. Using our method with state-of-the-art LLMs, we can successfully synthesize proofs for 46-58% of the first 100/260 lemmas from the newly published Busy Beaver proofs. |
Abstract: AI for Mathematics (AI4Math) is not only intriguing intellectually but also crucial for AI-driven discovery in science, engineering, and beyond. In the recent past, there have been many efforts on AI4Math that mirror techniques in NLP, in particular, by training large language models on textual math datasets. As a complementary yet less explored avenue, formal mathematical reasoning is grounded in formal systems such as proof assistants, which can verify the correctness of reasoning and provide automatic feedback. In this talk, we advocate for formal mathematical reasoning as a task for LLMs and argue that it is indispensable for advancing AI4Math to the next level. In recent years, we have seen steady progress in using AI to perform formal reasoning, including core tasks such as theorem proving and autoformalization, as well as emerging applications such as verifiable generation of code and hardware designs. However, significant challenges remain to be solved for AI to truly master mathematics and achieve broader impact. We summarize existing progress, discuss open challenges, and envision critical milestones to measure future success. We end by calling on the research community to come together to drive transformative advancements in this field.
Bio: Swarat Chaudhuri is a Professor of Computer Science and the director of the Trishul laboratory at UT Austin, as well as a Visiting Researcher at Google Deepmind, London. His research lies at the interface of machine learning, programming languages, and formal methods. Through a synthesis of ideas from these areas, he seeks to develop a new generation of intelligent systems that are reliable, transparent, and secure by construction and can solve mathematical reasoning, scientific discovery, and system engineering tasks beyond the scope of contemporary AI. Dr. Chaudhuri has received the NSF CAREER award, the ACM SIGPLAN John Reynolds Dissertation award, Meta and Google Research awards, and several ACM SIGPLAN and SIGSOFT distinguished paper awards. He also served as a Program Chair for CAV 2016 and ICLR 2024.
Contributed Talks devoted to applications of Large Language Models in formal verification for software, hardware, and systems engineering, including industrial applications
16:00 | Are LLMs Ready for Software Verification? ABSTRACT. Despite being able to produce reliable software, formal methods have hardly been adopted in mainstream programming. With the advent of large language models, it becomes more feasible to automatically generate code along with verification guarantees. This research explores whether LLMs can produce verified code from a textual description and a partial specification. We were able to achieve 63% success rate in Nagini and 40% in Verus on the HumanEval benchmark. |
16:30 | Towards LLM-support for Deductive Verification of Java Programs PRESENTER: Samuel Teuber ABSTRACT. Along with the success of Large Language Models (LLMs) in other domains, a quickly developing body of research is investigating the usage of Large Language Models to process program code. While much of this research focuses on the usage of LLMs for code generation, our work explores the usage of LLMs for generating (auxiliary) specifications. In contrast to pure code generation, where LLMs could produce buggy code, specification generation allows us to rigorously check consistency between pre-existing source code and generated annotations through the usage of theorem provers. To this end, we present our efforts to couple the deductive verification tool KeY with GPT to automate the process of annotating Java code with JML specifications. |
Contributed Talks devoted to Ethical and Philosophical issues arising in using Generative AI in Theorem Proving
Further Venue details: https://museum.rcsed.ac.uk/
Prior registration for the dinner is essential.