Hopper: Mining Sequential Patterns with Reliable Prediction Delays

Abstract. Summarizing sequential data with serial episodes allows non-trivial insight into the data generating process. Existing methods penalize gaps in pattern occurrences equally, regardless of where in the pattern these occur. This results in a strong bias against long inter-event delays, as well as that patterns that regularity in terms of delays is not rewarded or discovered—even though both aspects provide key insight.

In this paper we tackle both these problems by explicitly modeling inter-event delay distributions. That is, we are not only interested in discovering the patterns, but also in describing how many times steps typically occur between their individual events. We formalize the problem in terms of the Minimum Description Length principle, by which we say the best set of patterns is the one that compresses the data best. The resulting optimization problem does not lend itself to exact optimization, and hence we propose Hopper to heuristically mine high quality patterns. Extensive experiments show that Hopper efficiently recovers the ground truth, discovers meaningful patterns from real-world data, and outperforms existing methods in discovering long-delay patterns.

Implementation

the Python source code (January 2024) by Joscha Cueppers and Paul Krieger.

Related Publications

Cueppers, J, Krieger, P & Vreeken, J Discovering Sequential Patterns with Predictable Inter-Event Delays. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2024. (23.8% acceptance rate)