Program for Thursday, September 5th

PROGRAM FOR THURSDAY, SEPTEMBER 5TH

Days:

Contributed talks

09:00	Alexander Roodt, Brendan Watling, Willem Bester, Brink van der Merwe, Sicheol Sung and Yo-Sub Han Benchmarking Regular Expression Matching ABSTRACT. We benchmark matching time of regular expression matching engines when they use either the Thompson or Glushkov regular expressions to state machine conversion algorithms, with or without using memoisation while matching, and doing matching either by using a lockstep or a Spencer type scheduler. We conduct our empirical investigation by expanding on the virtual machine for regular expressions matching approach, introduced by Russ Cox.
09:30	Antoine Martin, Alexandre Duret-Lutz and Etienne Renault Translation of Semi-Extended Regular Expressions using Derivatives ABSTRACT. We generalize Antimirov's notion of \emph{linear form} of a regular expression, to the Semi-Extended Regular Expressions typically used in the Property Specification Language or SystemVerilog Assertions. Doing so requires extending the construction to handle more operators, and it requires dealing with expression are over alphabets $\Sigma=2^\AP$ of valuations of atomic propositions. Using linear forms to construct automata labeled by Boolean expressions suggests some heuristic that we evaluate. Finally, we also study a variant of this translation to automata with transition-based acceptance: this construction is more natural and provide smaller automata.
10:00	Robert Baumgartner and Sicco Verwer PDFA distillation with error bound guarantees ABSTRACT. Active learning algorithms to infer probabilistic finite automata (PFA) have gained interest recently, due to their ability to provide surrogate models for some types of neural networks (NNs). However, recent approaches either cannot guarantee determinism, which makes the automaton harder to understand and compute, or they rely on techniques that bound errors on individual transitions, possibly through quantization of distributions. In this work we propose a derivative of the recent L\# algorithm to learn deterministic PFA (PDFA) from systems that can return a distribution over a set of tokens given an input string. Along determinism, we can give error bounds on probabilities assigned to whole strings with an easy to understand approach. We show formal correctness of our algorithm and test it on neural networks trained to model three datasets from computer- and network-systems respectively. We show that the algorithm can learn the network's behaviour closely, and provide an example application of how the model can be used to interpret the network. We note that while we do not test this, our approach is in theory applicable in general to learn deterministic weighted finite automata (WFA). We provide the source code of our algorithm and relevant scripts on our public repository.

11:00-12:00 Session 11

Invited talk

11:00

Sang-Ki Ko

Neuro-Symbolic Approach for Efficient Regular Expression Inference

ABSTRACT. Due to the practical importance of regular expressions (regexes, for short), much research has been done on automatically generating regexes from positive and negative string examples. We tackle the problem of learning to generate regexes faster and more precisely from positive and negative examples by relying on a novel approach called `neural example splitting.' Using a neural network trained to group similar substrings from positive strings, our approach splits each positive example string into multiple parts. We propose an effective regex synthesis framework called `SplitRegex' that synthesizes subregexes from `split' positive substrings and produces the final regex by concatenating the synthesized subregexes. To verify the correctness of generated independent subregexes during the subregex synthesis process, we exploit negative strings by matching against the generated subregexes. As a result, we guarantee that the final regex rejects all negative strings. SplitRegex is a divided-and-conquer framework for learning target regexes by the following process: split (=divide) positive strings, infer partial regexes for multiple parts, which is much more accurate than the whole regex inference, and concatenate (=conquer) inferred regexes. We empirically demonstrate that the proposed SplitRegex framework improves the previous regex synthesis approaches over two practical datasets and one synthesized dataset.

12:00-17:30

Excursion to Kakunodate by bus (lunch, visit to a sake brewery and sightseeing)

18:30-21:00

Conference dinner