Tags:Formal Verification, Frequency Moments, Isabelle/HOL and Randomized Algorithms
Abstract:
In 1999 Alon et. al. introduced the still active research topic of approximating the frequency moments of a data stream using randomized algorithms with minimal space usage. This includes the problem of estimating the cardinality of the stream elements - the zeroth frequency moment. But also higher-order frequency moments that provide information about the skew of the data stream, which is for example critical information for parallel processing. (The k-th frequency moment of a data stream is the sum of the k-th powers of the occurrence counts of each element in the stream.) They introduce both lower bounds and upper bounds, which were later improved by newer publications. The algorithms have guaranteed success probabilities and accuracies without making any assumptions on the input distribution. They are an interesting use case for formal verification because their correctness proofs require a large body of deep results from algebra, analysis and probability theory. This work reports on the formal verification of three algorithms for the approximation of F_0, F_2 and F_k for k ≥ 3. The results include the identification of significantly simpler algorithms with the same runtime and space complexities as the previously known ones as well as the development of several reusable components, such as a formalization of universal hash families, amplification methods for randomized algorithms, a model for one-pass data stream algorithms or a generic flexible encoding library for the verification of space complexities.
Formalization of Randomized Approximation Algorithms for Frequency Moments