Decision tree learning is a widely used approach in machine learning, favoured in applications that require concise and interpretable models. Heuristic methods are traditionally used to quickly produce models with reasonably high accuracy. A commonly criticised point, however, is that the resulting trees may not necessarily be the best representation of the data in terms of accuracy and size. In recent years, this motivated the development of optimal classification tree algorithms that globally optimise the decision tree in contrast to heuristic methods that perform a sequence of locally optimal decisions.

In this talk I will explore the history of building decision trees, from greedy heuristic methods to modern optimal approaches.

In particular I will discuss a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our algorithm supports constraints on the depth of the tree and number of nodes. The success of our approach is attributed to a series of specialised techniques that exploit properties unique to classification trees. Whereas algorithms for optimal classification trees have traditionally been plagued by high runtimes and limited scalability, we show in a detailed experimental study that our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances, providing several orders of magnitude improvements and notably contributing towards the practical realisation of optimal decision trees.

I will briefly consider what remains to explore in creating optimal decision trees.

ABSTRACT. Decision trees are one of the most interpretable machine learning models and due to this property have gained increased attention in the past years. In this context, decision trees of small size and depth are of particular interest and new methods have been developed that can induce decision trees that are both small and accurate.

In this talk, I will give an overview of SAT-based methods that have been proposed, starting with general encodings for inducing decision trees and then discussing options for scaling them to larger instances in a heuristic manner. I will give particular focus to our algorithm DT-SLIM, which reduces the size and depth of a given decision tree by using a SAT solver to repeatedly improve localized parts of the tree.

Joint work with Stefan Szeider.

12:30-14:00Lunch Break

Lunches will be held in Taub hall and in The Grand Water Research Institute.

ABSTRACT. Experimental data is often given as bit vectors, with vectors
corresponding to observations, and coordinates to attributes, with a
bit being true if the corresponding attribute was observed.
Observations are usually grouped, e.g.\ into positive and negative
samples. Among the essential tasks on such data, we have
compression, the construction of classifiers for assigning new data,
and information extraction.

Our system, MCP, approaches these tasks by propositional logic.
For each group of observations, MCP constructs a (usually small)
conjunctive formula that is true for the observations of the group,
and false for the others. Depending on the settings, the formula
consists of Horn, dual-Horn, bijunctive or general clauses. To reduce
its size, only relevant subsets of the attributes are
considered. The formula is a (lossy) representation of the original
data and generalizes the observations, as it is usually satisfied by
more bit vectors than just the observations. It thus may serve as a
classifier for new data. Moreover, (dual-)Horn clauses, when read as
if-then rules, make dependencies between attributes explicit. They
can be regarded as an explanation for classification decisions.

A Rate-Distortion Framework for Explaining Model Decisions and Application to CartoonX

ABSTRACT. I will present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method for explaining black-box model decisions. The method extracts the relevant part of the signal that led the model to make its decision. As an application, I will present CartoonX, an RDE method based on wavelet representations for explaining image classification. Natural images can be well approximated by piecewise smooth signals — also called cartoon images — and tend to be sparse in the wavelet domain. Motivated by this, CartoonX requires its explanations to be sparse in the wavelet domain, thus extracting the relevant piecewise smooth part of an image. In practice, CartoonX provides not only highly interpretable explanations but also turned out to be particularly apt at explaining misclassifications.

ABSTRACT. Explaining the decisions made by machine learning models is a fundamental challenge given their involvement in almost all areas of our lives, especially those where the stakes are high, such as healthcare, judicial systems, or financial institutions. However, explanations can be misleading or confusing when they don't have a clear interpretation (an explanation without a clear interpretation might need itself to be explained!). The area of Formal eXplainable AI (XAI) studies explanations under a formal theoretical lens, where explanations have precise definitions and thus induce concrete computational problems. In this talk, I will summarize theoretical results about the computational complexity of formal explanations, as well as directions for obtaining them in practice through formal methods.

Verification of Binarized Neural Networks: Challenges and Opportunities

ABSTRACT. Analyzing the behavior of neural networks is one of the most pressing challenges in deep learning. Binarized Neural Networks are an important class of networks that allow equivalent representation in Boolean logic and can be analyzed formally with logic-based reasoning tools like SAT solvers. Such tools can be used to answer existential and probabilistic queries about the network, perform explanation generation, etc. However, the main bottleneck for all methods is their ability to reason about large BNNs efficiently. In this work, we analyze architectural design choices of BNNs and discuss how they affect the performance of logic-based reasoners. We discuss changes to the BNN architecture and the training procedure to get a simpler network for SAT solvers without sacrificing accuracy on the primary task. We will also discuss new results on Binarized Neural Networks and their training procedures.

ABSTRACT. After the discovery of adversarial examples, provable robustness guarantees for neural networks have received increasing attention from the research community. While these efforts have been successful for relatively small or heavily regularized models with limited standard accuracy, more precise models have proven much harder to analyze. Recently two paradigms have emerged to tackle this challenge: while Branch-and-Bound-based approaches break down a hard verification problem into increasingly many easier ones until they can be solved, Multi-Neuron constraints allow the verification of individual harder problems. In this talk, we discuss these two paradigms and how they can be combined to obtain a novel and highly effective verification framework.

Bio: Mark N. Müller is a Ph.D. student in the Secure, Reliable, and Intelligent Systems Lab at ETH Zurich, advised by Prof. Martin Vechev. His research focuses on different approaches to obtaining accurate models with formal robustness guarantees.