Efficiently Summarising Event Sequences with Rich Interleaving Patterns

Abstract. An ideal outcome of pattern mining is a small set of informative patterns, containing no redundancy or noise, that identifies the key structure of the data at hand. Standard frequent pattern mining techniques do not achieve this goal, as due to the pattern explosion typically very large numbers of highly redundant patterns are returned.

We pursue the ideal for sequential data, by employing a pattern set mining approach---an approach where, instead of ranking patterns individually, we consider results as a whole. Pattern set mining has been successfully applied to transactional data, but has been surprisingly under studied for sequential data.

In this paper, we employ the MDL principle to identify the set of sequential patterns that summarises the data best. In particular, we formalise how to encode sequential data using sets of serial episodes, and use the encoded length as a quality score. As search strategy, we propose two approaches: the first algorithm selects a good pattern set from a large candidate set, while the second is a parameter-free any-time algorithm that mines pattern sets directly from the data. Experimentation on synthetic and real data demonstrates we efficiently discover small sets of informative patterns.

Implementation

the C++ source code (July 2018), by Apratim Bhattacharyya.

Related Publications

Bhattacharyya, A & Vreeken, J Efficiently Summarising Event Sequences with Rich Interleaving Patterns. In: Proceedings of the SIAM Conference on Data Mining (SDM), pp 795-803, SIAM, 2017. (selected in the top 10 papers of SDM'17, 2.7% acceptance rate; overall 25%)
Bhattacharyya, A & Vreeken, J Efficiently Summarising Event Sequences with Rich Interleaving Patterns. Technical Report 1701.08096, arXiv, 2017.
Bhattacharyya, A Squish: Efficiently Summarising Sequences with Rich and Interleaving Patterns. M.Sc. Thesis, Saarland University, 2016.