View: session overviewtalk overview

13:00 | A Survey of Languages for Formalizing Mathematics PRESENTER: Cezary Kaliszyk ABSTRACT. In order to express mathematics formally, it is necessary to specify it in a language that allows both checking of the scripts and creation of the content. In this paper we survey various languages that have been created for this purpose, including logical languages of proof assistants and other formal systems, semi-formal intermediate languages created with the purpose of managing and exchanging mathematical knowledge, as well as language frameworks which allow expressing the syntax and semantics of other languages and provide functionality like disambiguation. |

13:30 | Representing Structural Language Features in Formal Meta-Languages PRESENTER: Dennis Müller ABSTRACT. Structural language features are those that introduce new kinds of declarations as opposed to those that only add expressions. They pose a significant challenge when representing languages in meta-languages such as standard formats like OMDoc or logical frameworks like LF. It is desirable to use shallow representations where a structural language feature is represented by the analogous feature of the meta-language, but the richness of structural language features in practical languages makes this difficult. Therefore, the current state of the art is to encode unrepresentable structural language features in terms of more elementary ones, but that makes the representations difficult to reuse and verify. This challenge is exacerbated by the fact that many languages allow users to add new structural language features that are elaborated into a small trusted kernel, which allows for a large and growing set of features. In this paper we extend the MMT representation framework with a generic concept of structural features. This allows defining exactly the language features needed for elegant shallow embeddings of object languages. The key achievement here is to make this concept expressive enough to cover complex practical features while retaining the simplicity of existing meta-languages. We exemplify our framework with representations of various important structural features including datatype definitions and theory instantiations. |

14:00 | Formalization of elementary number theory in Mizar ABSTRACT. In this paper we present a dataset based on the Mizar formalization of selected problems related to elementary number theory. The dataset comprises proofs of problems on several levels of difficulty. They are available in the form of full proofs, proof sketches, as well as bare statements equipped with suitable environments importing necessary notions from the Mizar Mathematical Library. The elementary character of the underlying theory makes the data particularly suitable as a starting point for developing courses in interactive theorem proving based mathematics education and recreational mathematics activities. |

14:15 | Interpreting Mathematical Texts in Naproche-SAD (and: From LaTeX to Lean with a controlled natural language) PRESENTER: Peter Koepke ABSTRACT. We have extended the natural proof assistant Naproche-SAD by a new parsing process. The natural input language ForTheL has been integrated into LaTeX which, besides high-quality mathematical typesetting, provides useful text structurings and disambiguations. Separating parsing from further logical processing we can also translate into other formal languages besides FOL. This yields correct Lean code from ForTheL statements which may be useful for writing readable fabstracts. |

16:30 | Maintaining a Library of Formal Mathematics PRESENTER: Gabriel Ebner ABSTRACT. The Lean mathematical library \textsf{mathlib} is developed by a community of users with very different backgrounds and levels of experience. To lower the barrier of entry for contributors and to lessen the burden of reviewing contributions, we have developed a number of tools for the library. These tools check proof developments for subtle mistakes in the code and generate documentation suited for our varied audience. |

17:00 | AutoMSC: Automatic Assignment of Mathematicas Subject Classification Labels PRESENTER: Moritz Schubotz ABSTRACT. Authors of research papers in the fields of mathematics, and other math-heavy disciplines commonly employ the Mathematics Subject Classification (MSC) scheme to search for relevant literature. The MSC is a hierarchical alphanumerical classification scheme that allows librarians to specify one or multiple codes for publications. Digital Libraries in Mathematics, as well as reviewing services, such as zbMATH and Mathematical Reviews (MR) rely on these MSC labels in their workflows to organize abstracts and reviews. Therefore, editors manually assign a coarse-grained classification, which determines the subject editor who then performs the fine-grained labeling. In this paper, we investigate the feasibility of automatically assigning a coarse-grained primary classification using the MSC scheme, by regarding the problem as a multi-class classification machine learning task. We find that our method achieves an F_1-score of over 77%, which is remarkably close to the agreement of zbMATH and MR (F_1-score of 81\%). Moreover, we find that the method's confidence score allows reducing the effort by 86% compared to the manual coarse-grained classification effort while maintaining a precision of 81% for automatically classified articles. |

17:30 | Leveraging Information Contained in Theory Presentations PRESENTER: Yasmine Sharoda ABSTRACT. A theorem prover without an extensive library is much less useful to its potential users. Algebra, the study of algebraic structures, is a core component of such libraries. Algebraic theories also are themselves structured, the study of which was started as Universal Algebra. Various constructions (homomorphism, term algebras, products, etc) and their properties are both universal and constructive. Thus they are ripe for being automated. Unfortunately, current practice still requires library builders to write these by hand. We first highlight specific redundancies in libraries of existing systems. Then we describe a framework for generating these derived concepts from theory definitions. We demonstrate the usefulness of this framework on a test library of 170 theories. |

18:00 | A Contextual and Labeled Math-Dataset Derived from NIST's DLMF PRESENTER: Abdou Youssef ABSTRACT. Machine Learning (ML) and Natural Language Processing (NLP) have started to be applied to math language processing and math knowledge discovery. To fully utilize ML in those areas, there is a press- ing need for Math labeled datasets. This paper presents a new dataset that we have derived from the widely used Digital Library of Mathemat- ical Functions (DLMF) of NIST. The dataset is structured and labeled in a specic way. For each math equation and expression in the DLMF, there is a record that provides annotational and contextual elements. An accompanying dataset is also generated from the DLMF. It con- sists of \Simple XML" les, each organized as marked-up sentences within a marked-up hierarchy of paragraphs/subsections/sections. The math in each sentence is marked up in a way that enables users to ex- tract the actual context of math elements, at various levels of granular- ity, for contextualized processing. This context-rich, sentence-oriented, equation/expression-centered, symbol-labeled dataset is motivated by the fact that much of ML-based NLP algorithms are sentence oriented. |

18:15 | TGView3D: a System for 3-Dimensional Visualization of Theory Graphs PRESENTER: Richard Marcus ABSTRACT. We describe the TGView3D system, an interactive graph viewer optimized for exploring mathematical knowledge as 3D graphs. To exploit all three spatial dimensions, it extends the commonly-used forcedirected layout algorithms with hierarchical components that are more suitable for the typical structure of mathematical knowledge. TGView3D can also communicate with OMDoc-based knowledge management tools in order to offer semantic, mathematics-specific interaction with the graphs. |