[C2] A text-based syntax completion method using LR parsing
ABSTRACT. This paper presents a text-based syntax completion method using an LR parser. We propose formal definitions of candidate text to be completed based on the sentential forms, and we design algorithms for computing candidates through reductions in the LR parsing. This is in contrast to the existing methods that have not clearly stated what candidates they intend to produce. This is also different from a transformation approach using an LR parser, which transforms the grammar of the programming language, a burdensome task at this moment. The advantage of our method is that LR parsers can be adopted without modification, and a syntax completion system can be built using them, without incurring efforts. We implemented the algorithms as an Emacs server to demonstrate the feasibility of their application.
この論文はACM SIGPLAN 2021 Workshop on Partial Evaluation and Program Manipulation (PEPM 2021)で発表済です。
Lexical analysis, which converts input text into a list
of tokens, plays an important role in many applications, including
compilation and data extraction from texts. To recognize token
patterns, a lexer incorporates a sequential computation model
— automaton as its basic building component. As such, it is
considered difficult to parallelize due to the inherent data dependency.
Much work has been done to accelerate lexical analysis
through parallel techniques. Unfortunately, existing attempts
mainly rely on language-specific remedies for input segmentation,
which makes it not only tricky for language extension, but also
challenging for automatic lexer generation.
This paper presents Plex — an automated tool for generating
parallel lexers from user-defined grammars. To overcome the
inherent sequentiality, Plex applies a fast prescanning phase
to collect context information prior to scanning. To reduce
the overheads brought by prescanning, Plex adopts a special
automaton, which is derived from that of the scanner, to avoid
backtracking behavior and exploits data-parallel techniques. The
evaluation under several languages shows that the prescanning
overhead is small, and consequently Plex is scalable and achieves
9.8-11.5X speedups using 18 threads.
ABSTRACT. We introduce context-free grammars with lookahead. The grammars are an extension of both context-free grammars and parsing expression grammars, hence we can handle the two grammars in a unified way. To accommodate lookahead, we use a language with lookahead, which is a set of string pairs. We considered the grammar as a system of equations and give the language with lookahead by the limit of iterations from the empty set. The language class is closed under union, intersection, complement, and a weak version of concatenation and Kleene star.