Online Machine Learning Techniques for Coq: A Comparison

ABSTRACT. We present a comparison of several online machine learning techniques for
tactical learning and proving in the Coq proof assistant. This work builds on
top of Tactician, a plugin for Coq that learns from proofs written by the user
to synthesize new proofs. Learning happens in an online manner, meaning that
Tactician's machine learning model is updated immediately every time the user
performs a step in an interactive proof. This has important advantages
compared to the more studied offline learning systems: (1) it provides the
user with a seamless, interactive experience with Tactician and, (2) it takes
advantage of locality of proof similarity, which means that proofs similar to
the current proof are likely to be found close by. We implement two online
methods, namely approximate k-nearest neighbors based on locality sensitive
hashing forests and random decision forests. Additionally, we conduct
experiments with gradient boosted trees in an offline setting using XGBoost.
We compare the relative performance of Tactician using these three learning
methods on Coq's standard library.

Towards Math-Term Disambiguation Using Machine Learning

ABSTRACT. Word disambiguation has been an important task in natural language processing.However, the problem of disambiguation is still less explored in mathematical text.Similar to natural languages, some math terms are not assigned with a unique interpre-tation. As math text is an important part of the scientific literature, an accurate andefficient way of performing disambiguation of math terms will be a significant contribu-tion. In this paper, we present some investigations on math-term disambiguation usingmachine learning. All experimental data are selected from the DLMF dataset. Ourexperiments consist of 3 steps: (1) create a labeled dataset of math equations (fromthe DLMF) where the instances are (math token, token meaning) pairs, grouped byequation; (2) build machine learning models and train them using our labeled dataset,and (3) evaluate and compare the performance of our models using different evaluationmetrics. Our results show that machine learning is an effective approach to math-termdisambiguation. The accuracy of models ranges from 70% to 85%. There is potentialfor considerable improvements once we have much larger labeled datasets with morebalanced classes.

Heterogeneous Heuristic Optimisation and Selection for First-Order Theorem Proving

ABSTRACT. Good heuristics are essential for successful proof search in first-order automated theorem proving. As a result, state-of-the-art theorem provers offer a range of options for tuning the proof search process to specific problems. However, the vast configuration space makes it exceedingly challenging to construct effective heuristics.
In this paper we present a new approach called HOS-ML, for automatically discovering new heuristics and mapping problems into optimised local schedules comprising of these heuristics. Our approach is based on interleaving Bayesian hyper-parameter optimisation for discovering promising heuristics and dynamic clustering to make optimisation efficient on heterogeneous problems. HOS-ML also use constraint programming to devise locally optimal schedules and machine learning for mapping unseen problems into such schedules. We evaluated HOS-ML on a theorem prover iProver and demonstrated that it can discover new heuristics that considerably improve performance and can solve problems that have not been solved previously by any other system.

A Heuristic Prover for Elementary Analysis in Theorema (System Description)

ABSTRACT. We present a plug-in to the Theorema system, which generates proofs similar to those produced by humans for theorems in elementary analysis and is based on heuristic techniques combining methods from automated reasoning and computer algebra.
The prover is able to construct automatically natural-style proofs for various examples related to convergence of sequences as well as limits, continuity, and uniform continuity of functions.
Additionally to general inference rules for predicate logic, the techniques used are: the S-decomposition method for formulae with alternating quantifiers, quantifier elimination by cylindrical algebraic decomposition, analysis of terms behavior in zero, bounding the $\epsilon$-bounds, semantic simplification of expressions involving absolute value, polynomial arithmetic, usage of equal arguments to arbitrary functions, and automatic reordering of proof steps in order to check the admissibility of solutions to the meta-variables.
The problem of proving such theorems directly without using refutation and clausification is logically equivalent to the problem of satisfiability modulo the theory of real numbers, thus these techniques are relevant for SMT solving also.

Extending GeoGebra/realgeom with QEPCAD B to obtain proofs on geometric inequalities

ABSTRACT. We report on our recent improvements of the GeoGebra/realgeom system that is able to symbolically compare geometric quantities of a given planar geometric figure. QEPCAD B is connected as an external tool now to solve the geometric problem that was translated
into an algebraic form first, and by using real quantifier elimination, a generally true/false answer can be given. Our implementation allows free access to this toolset, not just for researchers but teachers and students too, on all major platforms including Windows, Mac and Linux.

Structure in Theorem Proving: Analyzing and Improving the Isabelle Archive of Formal Proofs

ABSTRACT. The Isabelle Archive of Formal Proofs has grown to a significant size in the past years.
It makes up for an impressive body of research,
which enables a number of statistical approaches to various aspects in theorem proving,
and has not yet been utilized exhaustively.
However, the growing size also poses some challenges to address:
Material becomes increasingly harder to find,
reusability and ease of understanding become more important.
This thesis abstract summarizes my research plans on those topics.

Reasoning Support for Undefinedness and Soft Typing in Formal Mathematics

ABSTRACT. Two aspects that are ubiquitous in mathematics as performed by mathematicians, but as of today still not very prevalent in formal systems are undefinedness and soft typing.

Undefinedness is the property of formal systems that certain syntactically well-formed terms can not be assigned a "natural" value.
Soft types (as opposed to hard types) are systems in which not all typing information is available from the introduction but can be added or derived later.

My goal in my PhD is to extend the MMT system such that both of these features can be easily added to any ongoing formalisation effort and also add automated reasoning support (using the ELPI system) to solve any small or cumbersome proof obligation arising from their inclusion.