Improving Statistical Linguistic Algorithms for Parsing Mathematics

10 pagesPublished: September 27, 2016


In this paper we describe our combined statistical/semantic parsing method based on the CYK chart-parsing algorithm augmented with limited internal typechecking and external ATP filtering. This method was previously evaluated on parsing ambiguous mathematical expressions over the informalized Flyspeck corpus of 20000 theorems. We first discuss the motivation and drawbacks of the first version of the CYK-based component of the algorithm, and then we propose and implement a more sophisticated approach based on better statistical model of mathematical data structures.

Keyphrases: computational linguistics, automated reasoning, Flyspeck, HOL Light, Parsing Mathematics, type checking

In: Boris Konev, Stephan Schulz and Laurent Simon (editors). IWIL-2015. 11th International Workshop on the Implementation of Logics, vol 40, pages 27--36

