Abstract. In supervised pattern mining – also known as subgroup discovery – a crucial part is to discover high quality one-dimensional subgroups as well as high quality refinements for existing subgroups, commonly known as binary features. For nominal attributes, this can be done by directly considering individual attribute values. The task, however, is more challenging for numerical attributes. In particular, individual numeric values are not reliable statistics on their own and we switch to combinations (bins) of adjacent values. Existing binning strategies however do not directly optimize quality of the bins for subgroup discovery, affecting the final output quality.
In this paper, we propose Flexi for addressing this issue. In short, Flexi uses optimal binning to find high quality binary features for both numeric and ordinal attributes. We instantiate Flexi with various quality measures and show how to achieve efficiency accordingly. Experiments on both synthetic and real-world data sets validate the benefits of our method.
Flexibly Mining Better Subgroups. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 585-593, SIAM, 2016. (overall 25% acceptance rate) |
|