Tags:CNF, Competition, Evaluation, SAT and Scrambling
Abstract:
It has been an ongoing debate for a long time about how SAT solvers and in general different or new algorithms should be evaluated and compared both in competitions and more importantly in papers. Evaluations are usually performed on existing benchmarks. Cross-validation and other means to avoid over-fitting are rarely used. In this paper we revisit the old idea of scrambling benchmarks also used in early competitions. Scrambling has the goal to make results of such evaluations more robust. We present a new method for scrambling CNFs, which allows to gradually increase the effect of scrambling, from keeping the scrambled CNF close to the original CNF, to complete random permutation of variables, clauses, and phases of literals. We used this method to scramble benchmarks from the last two SAT competitions and solved them with the best solvers in the main track of the last SAT competition. As expected our experimental results suggest that scrambling has a substantial effect on the performance of individual solvers but surprisingly has little effect on rankings among solvers. As a consequence we argue that only using our method of scrambling is not enough to increase robustness of competitions and evaluations in general.