Discovering Interpretable Data-to-Sequence Generators

Abstract. We study the problem of predicting an event sequence given some meta data. In particular, we are interested in learning easily interpretable models that can accurately generate a sequence based on an attribute vector. To this end, we propose to learn a sparse event-flow graph over the training sequences, and statistically robust rules that use meta data to determine which paths to follow. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we identify the best model as the one that compresses the data best. As the resulting optimization problem is NP-hard, we propose the efficient Consequence algorithm to discover good event-flow graphs from data.

Through an extensive set of experiments including a case study, we show that it ably discovers compact, interpretable and accurate models for the generation and prediction of event sequences from data, has a low sample complexity, and is particularly robust against noise.

Implementation

the source code on GitHub
the Python source code (December 2021), by Boris Wiegand.
Readme, including link to docker instantiation

Related Publications

Wiegand, B, Klakow, D & Vreeken, J Mining Interpretable Data-to-Sequence Generators. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2022. (15.0% acceptance rate)