Discovering Significant Patterns under Sequential False Discovery Control

Abstract. We are interested in discovering those patterns from data with an empirical frequency that is significantly differently than expected. To avoid spurious results, yet achieve high statistical power, we propose to sequentially control for false discoveries during the search. To avoid redundancy, we propose to update our expectations whenever we discover a significant pattern. To efficiently consider the exponentially sized search space, we employ an easy-to-compute upper bound on significance, and propose an effective search strategy for sets of significant patterns. Through an extensive set of experiments on synthetic data, we show that our method, Spass, recovers the ground truth reliably, does so efficiently, and without redundancy. On real-world data we show it works well on both single and multiple classes, on low and high dimensional data, and through case studies that it discovers meaningful results.

Implementation

the replication package including code and data for Dalleiger & Vreeken (KDD 2022)

the Python source code by Sebastian Dalleiger.

the used datasets pre-processed by Sebastian Dalleiger.

Related Publications

Dalleiger, S & Vreeken, J Discovering Significant Patterns under Sequential False Discovery Control. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 263-272, ACM, 2022. (15.0% acceptance rate)