Calculating Sufficient Reasons for Random Forest Classifiers

Title:Calculating Sufficient Reasons for Random Forest Classifiers

Conference:POS-22

Tags:CNF Encoding, Incremental SAT Solving, Prime Implicants, Python Scikit-learn, Random Forest Classifier and Sufficient Reasons

Abstract:

In this paper, we formalize the implementations of decision tree and random forest classifiers of the Python Scikit-learn package, and we present a CNF encoding which can be used to calculate sufficient reasons a.k.a. prime implicants for them. Our decision tree encoding resembles a monotonic combinational circuit with pure input variables, of which we take advantage in our method for incremental enumeration of its prime implicants. Encoding the combination of several decision trees in a random forest would add a non-monotonic evaluation function to the root of the encoded circuit. In our approach, we solve an auxiliary SAT problem for enumerating valid leaf-node combinations of the random forest. Each valid leaf-node combination is used to incrementally update the original monotonic circuit encoding by extending the DNF at its root with a new term. Preliminary results show that enumeration of prime implicants by incrementally updating the encoding is by order of magnitudes faster than one-shot solving of the monolithic formula. We present preliminary runtime data, and some initial data about the size and number of samples found when translating the prime implicants back into database queries.