Keeping it Short and Simple
Summarising Complex Event Sequences with Multivariate Patterns

Abstract. We study how to obtain concise descriptions of discrete multivariate sequential data in terms of rich multivariate sequential patterns that can capture potentially highly interesting (cor)relations between sequences. To this end we allow our pattern language to span over the alphabets (domains) of all sequences, allow patterns to overlap temporally, and allow for gaps in their occurrences. We formalise our goal by the Minimum Description Length principle, by which our objective is to discover the set of patterns that provides the most succinct description of the data. To discover good pattern sets, we introduce Ditto, an efficient algorithm to approximate the ideal result. We support our claim with a set of experiments on both synthetic and real data.

Implementation

the C++ source code (Feb 2016) by Roel Bertens.

Related Publications

Bertens, R, Vreeken, J & Siebes, A Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns. Technical Report 1512.07056, arXiv, 2016.
Bertens, R, Vreeken, J & Siebes, A Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16), pp 735-744, ACM, 2016. (oral presentation, 8.9% acceptance rate; overall 18.1%)