Flexibly Mining Better Subgroups

Abstract. In supervised pattern mining – also known as subgroup discovery – a crucial part is to discover high quality one-dimensional subgroups as well as high quality refinements for existing subgroups, commonly known as binary features. For nominal attributes, this can be done by directly considering individual attribute values. The task, however, is more challenging for numerical attributes. In particular, individual numeric values are not reliable statistics on their own and we switch to combinations (bins) of adjacent values. Existing binning strategies however do not directly optimize quality of the bins for subgroup discovery, affecting the final output quality.

In this paper, we propose Flexi for addressing this issue. In short, Flexi uses optimal binning to find high quality binary features for both numeric and ordinal attributes. We instantiate Flexi with various quality measures and show how to achieve efficiency accordingly. Experiments on both synthetic and real-world data sets validate the benefits of our method.

Implementation

the Java source code (October 2015) by Hoang Vu Nguyen.

Related Publications

Nguyen, H-V & Vreeken, J Flexibly Mining Better Subgroups. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 585-593, SIAM, 2016. (overall 25% acceptance rate)
Nguyen, H-V & Vreeken, J Flexibly Mining Better Subgroups. Technical Report 1510.08382, arXiv, 2015.