Tags:Formal verification, Neural networks and Poisoning attacks
Abstract:
Neural networks are successfully used in many domains including safety and security critical applications. As a result researchers have proposed formal verification techniques for verifying neural network properties. A large majority of previous efforts have focused on checking local robustness in neural networks. We instead focus on another neural network security issue, namely data poisoning, whereby an attacker inserts a trigger into a subset of the training data, in such a way that at test time, this trigger causes the classifier to predict some target class. In this paper, we show how to formulate absence of data poisoning as a property that can be checked with off-the-shelf verification tools, such as Marabou and nneum. Counterexamples of failed checks constitute potential triggers that we validate through testing. We further show that the discovered triggers are ‘transferable’ from a small model to a larger, better-trained model, allowing us to analyze state-of-the art performant models trained for image classification tasks.