previous day
next day
all days

View: session overviewtalk overview

13:00-14:30 Session 5: Formalization Languages, Ontologies
A Survey of Languages for Formalizing Mathematics
PRESENTER: Cezary Kaliszyk

ABSTRACT. In order to express mathematics formally, it is necessary to specify it in a language that allows both checking of the scripts and creation of the content. In this paper we survey various languages that have been created for this purpose, including logical languages of proof assistants and other formal systems, semi-formal intermediate languages created with the purpose of managing and exchanging mathematical knowledge, as well as language frameworks which allow expressing the syntax and semantics of other languages and provide functionality like disambiguation.

Representing Structural Language Features in Formal Meta-Languages
PRESENTER: Dennis Müller

ABSTRACT. Structural language features are those that introduce new kinds of declarations as opposed to those that only add expressions. They pose a significant challenge when representing languages in meta-languages such as standard formats like OMDoc or logical frameworks like LF. It is desirable to use shallow representations where a structural language feature is represented by the analogous feature of the meta-language, but the richness of structural language features in practical languages makes this difficult. Therefore, the current state of the art is to encode unrepresentable structural language features in terms of more elementary ones, but that makes the representations difficult to reuse and verify. This challenge is exacerbated by the fact that many languages allow users to add new structural language features that are elaborated into a small trusted kernel, which allows for a large and growing set of features.

In this paper we extend the MMT representation framework with a generic concept of structural features. This allows defining exactly the language features needed for elegant shallow embeddings of object languages. The key achievement here is to make this concept expressive enough to cover complex practical features while retaining the simplicity of existing meta-languages. We exemplify our framework with representations of various important structural features including datatype definitions and theory instantiations.

Formalization of elementary number theory in Mizar

ABSTRACT. In this paper we present a dataset based on the Mizar formalization of selected problems related to elementary number theory. The dataset comprises proofs of problems on several levels of difficulty. They are available in the form of full proofs, proof sketches, as well as bare statements equipped with suitable environments importing necessary notions from the Mizar Mathematical Library. The elementary character of the underlying theory makes the data particularly suitable as a starting point for developing courses in interactive theorem proving based mathematics education and recreational mathematics activities.

Interpreting Mathematical Texts in Naproche-SAD (and: From LaTeX to Lean with a controlled natural language)
PRESENTER: Peter Koepke

ABSTRACT. We have extended the natural proof assistant Naproche-SAD by a new parsing process. The natural input language ForTheL has been integrated into LaTeX which, besides high-quality mathematical typesetting, provides useful text structurings and disambiguations. Separating parsing from further logical processing we can also translate into other formal languages besides FOL. This yields correct Lean code from ForTheL statements which may be useful for writing readable fabstracts.

15:00-16:00 Session 6: Formally Verified Tools
(Invited talk) Formally Verified Constraints Solvers: a Guided Tour

ABSTRACT. Constraint solvers are complex tools implementing tricky algorithms and heuristics manipulating intricate data structures. It is well-known that they have bugs. Certifying the output of such tools is extremely important in particular when they are used for critical systems or in verification tools. There are mainly two ways for having confidence in the computed results: making the solver produce not only the output but also proof logs that can be easily verified by an external checker or proving the correctness of the solver itself. The former approach is widespread in the Boolean satisfiability community through formats such as DRAT which can be considered as a standard. The latter approach has been followed for example for developing Compcert and sel4 respectively a C compiler developed and formally verified with the help of the Coq proof assistant Coq and a micro-kernel developed and formally verified with the proof assistant Isabelle/HOL.

This talk focusses on Constraint Programming (CP) solvers on finite domains (FD) and explores the work developed for some years by the author and some other researchers around the Coq formalization of such CP(FD) solvers. The starting point is the development, in 2012, of a formally verified CP(FD) solver limited to binary constraints implementing a classical filtering algorithm, AC3 and one of its extension AC2001, both looking for arc consistency. It is proved sound and complete. A variant implementing bound consistency has also been formalized and proved sound and complete. Recent work1 concerns the Coq formalization of a pre-processor that transforms a constraint satisfaction problem containing non-binary constraints into an equivalent binary constraint satisfaction problem, allowing for resolution of more complex CP programs. Different representations of domains are investigated, simple ordered lists or lists of intervals. The Coq formalization of Regin’s filtering algorithm for the global constraint alldifferent that enforces some variables to be assigned to distinct values is under progress. But here, contrary to the previous formalizations, a large amount of results from graph theory (maximal matching, augmenting path, etc) is needed and requires an intensive proof effort because there is no corresponding off-the-shelf Coq library.

Many ingredients are thus present to build a reference implementation that could be used as a second shot verifier in some critical cases or to validate existing solvers.

The talk discusses the main challenges of such a research work: efficiency, genericity, development of mathematical theorems or reuse of such theorems formalized in other proof assistants (like Berge’s theorem recently formalized by Abdulaziz and al in in Isabelle/HOL, required in the alldifferent context), etc. A last, but not least, important challenge is to convince the CP community to use such formal tools (The work presented in concerns ternary constraints but has been recently extended to n-ary constraints.).

16:30-18:30 Session 7: Managing Mathematical Libraries and Data
Maintaining a Library of Formal Mathematics
PRESENTER: Gabriel Ebner

ABSTRACT. The Lean mathematical library \textsf{mathlib} is developed by a community of users with very different backgrounds and levels of experience. To lower the barrier of entry for contributors and to lessen the burden of reviewing contributions, we have developed a number of tools for the library. These tools check proof developments for subtle mistakes in the code and generate documentation suited for our varied audience.

AutoMSC: Automatic Assignment of Mathematicas Subject Classification Labels
PRESENTER: Moritz Schubotz

ABSTRACT. Authors of research papers in the fields of mathematics, and other math-heavy disciplines commonly employ the Mathematics Subject Classification (MSC) scheme to search for relevant literature. The MSC is a hierarchical alphanumerical classification scheme that allows librarians to specify one or multiple codes for publications. Digital Libraries in Mathematics, as well as reviewing services, such as zbMATH and Mathematical Reviews (MR) rely on these MSC labels in their workflows to organize abstracts and reviews. Therefore, editors manually assign a coarse-grained classification, which determines the subject editor who then performs the fine-grained labeling.

In this paper, we investigate the feasibility of automatically assigning a coarse-grained primary classification using the MSC scheme, by regarding the problem as a multi-class classification machine learning task. We find that our method achieves an F_1-score of over 77%, which is remarkably close to the agreement of zbMATH and MR (F_1-score of 81\%). Moreover, we find that the method's confidence score allows reducing the effort by 86% compared to the manual coarse-grained classification effort while maintaining a precision of 81% for automatically classified articles.

Leveraging Information Contained in Theory Presentations
PRESENTER: Yasmine Sharoda

ABSTRACT. A theorem prover without an extensive library is much less useful to its potential users. Algebra, the study of algebraic structures, is a core component of such libraries. Algebraic theories also are themselves structured, the study of which was started as Universal Algebra. Various constructions (homomorphism, term algebras, products, etc) and their properties are both universal and constructive. Thus they are ripe for being automated. Unfortunately, current practice still requires library builders to write these by hand.

We first highlight specific redundancies in libraries of existing systems. Then we describe a framework for generating these derived concepts from theory definitions. We demonstrate the usefulness of this framework on a test library of 170 theories.

A Contextual and Labeled Math-Dataset Derived from NIST's DLMF
PRESENTER: Abdou Youssef

ABSTRACT. Machine Learning (ML) and Natural Language Processing (NLP) have started to be applied to math language processing and math knowledge discovery. To fully utilize ML in those areas, there is a press- ing need for Math labeled datasets. This paper presents a new dataset that we have derived from the widely used Digital Library of Mathemat- ical Functions (DLMF) of NIST. The dataset is structured and labeled in a specic way. For each math equation and expression in the DLMF, there is a record that provides annotational and contextual elements. An accompanying dataset is also generated from the DLMF. It con- sists of \Simple XML" les, each organized as marked-up sentences within a marked-up hierarchy of paragraphs/subsections/sections. The math in each sentence is marked up in a way that enables users to ex- tract the actual context of math elements, at various levels of granular- ity, for contextualized processing. This context-rich, sentence-oriented, equation/expression-centered, symbol-labeled dataset is motivated by the fact that much of ML-based NLP algorithms are sentence oriented.

TGView3D: a System for 3-Dimensional Visualization of Theory Graphs
PRESENTER: Richard Marcus

ABSTRACT. We describe the TGView3D system, an interactive graph viewer optimized for exploring mathematical knowledge as 3D graphs. To exploit all three spatial dimensions, it extends the commonly-used forcedirected layout algorithms with hierarchical components that are more suitable for the typical structure of mathematical knowledge. TGView3D can also communicate with OMDoc-based knowledge management tools in order to offer semantic, mathematics-specific interaction with the graphs.