When considering a data set it is often unknown how complex it is, and as a result it is difficult to choose a model that describes its main characteristics without overfitting. Often these choices are left to the domain expert, which is highly unsatisfactory in practice. The Minimum Description Length (MDL) principle answers the model selection problem from a clear and intuitively appealing viewpoint. In a nutshell, it asserts that the best model is the one that compresses both the data and model best. In this tutorial we do not only give an introduction to the very basics of model selection, show important properties of MDL-based modelling, successful examples as well as pitfalls for how to apply MDL to solve data mining problems, but also introduce advanced topics on important new concepts in modern MDL and emerging applications in dynamic settings. Throughout, our goal is to make sure that audience grasps not only the basic theory, but also see how this can be put to practice.
Selecting a model for a given set of data is at the heart of what data analysts do, whether they are statisticians, machine learners or data miners. However, the philosopher Hume already pointed out that the "Problem of Induction" is unsolvable; there are infinitely many functions that touch any finite set of points. So, it is not surprising that there are many different principled approaches to guide the search for a good model. Well-known examples are Bayesian Statistics and Statistical Learning Theory.
In the last decade information theoretic methods for selecting the best model slowly but surely became popular in the data mining community, and have led to state-of-the-art solutions in areas as diverse as pattern based modelling, change detection, and causal inference. In this tutorial we will review the state-of-the-art in information-theoretic model selection based on the Minimum Description Length principle and its implication and applications in data mining.
|Part 1: Introduction to MDL
The first part of the tutorial gives a gentle introduction to the Minimum Description Length principle. We will discuss topics such as model selection, Occam's Razor, Two-Part MDL and its relation to AIC, BIC, and Kolmogorov Complexity. We will discuss the strengths and weaknesses of two-part MDL, after which we introduce the modern concept of Refined MDL.
|Part 2: MDL in Action
In the second part of the tutorial we will showcase examples of how we can solve modern data mining problems using MDL. Amongst others we will discuss tasks such as pattern-based modelling, anomaly detection, causal inference, and model selection in deep learning.
|Part 3: Stochastic Complexity
In the third part we will focus on more advanced recent insights in MDL. In particular, we will discuss the concept of stochastic complexity, the normalized maximum likelihood (NML) codelength, and how to compute it efficiently for a wide range of model classes, and how to apply them to latent variable model selection.
|Part 4: MDL in Dynamic Action
In the fourth part we will showcase examples of how modern MDL insights allow us to solve data mining problems in dynamic settings. This includes computing change statistics, dynamic model selection, failure detection, etc.