Tags:coupling invariant, coupling predicate, MapReduce framework, program equivalence, program transformation, Relational Equivalence Verification and relational reasoning

Abstract:

Distributed programming frameworks like MapReduce, Spark and Thrill, are widely used for the implementation of algorithms operating on large datasets. However, implementing in these frameworks is more demanding than coming up with sequential implementations. One way to achieve correctness of an optimized implementation is by deriving it from an existing imperative sequential algorithm description through a sequence of behavior-preserving transformations.

We present a novel approach for proving equivalence between imperative and MapReduce algorithms based on partitioning the equivalence proof into a sequence of equivalence proofs between intermediate programs with smaller differences. Our approach is based on the insight that proofs are best conducted using a combination of two kinds of steps: (1) uniform context-independent rewriting transformations; and (2) context-dependent flexible transformations that can be proved using relational reasoning with coupling invariants.

We demonstrate the feasibility of our approach by evaluating it on two prototypical algorithms commonly used as examples in MapReduce frameworks: k-means and PageRank. To carry out the proofs, we use a higher-order theorem prover with partial proof automation. The results show that our approach and its prototypical implementation enable equivalence proofs of non-trivial algorithms and could be automated to a large degree.

Relational Equivalence Proofs Between Imperative and MapReduce Algorithms