What's in the Box? Explaining Neural Networks with Robust Rules

Abstract. We propose a novel method for exploring how neurons within neural networks interact. In particular, we consider activation values of a network for given data, and propose to mine noise-robust rules of the form \(X \rightarrow Y\), where \(X\) and \(Y\) are sets of neurons in different layers. We identify the best set of rules by the Minimum Description Length principle, as those rules that together are most descriptive of the activation data. To learn good rule sets in practice, we propose the unsupervised ExplaiNN algorithm. Extensive evaluation shows that the patterns it discovers give clear insight into how networks perceive the world: they identify shared and class-specific traits, compositionality, as well as locality in convolutional layers. Moreover, they are not only easily interpretable, but also super-charge prototyping by identifying which neurons to consider in unison.

Implementation

the C++ source code (June 2021) by Jonas Fischer and Anna Oláh.

Related Publications

Fischer, J, Oláh, A & Vreeken, J What's in the Box? Explaining Neural Networks with Robust Rules. In: Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2021. (21.4% acceptance rate)