Abstract. Finding and describing sub-populations that are exceptional regarding a target property has important applications in many scientific disciplines, from identifying disadvantaged demographic groups in census data to finding conductive molecules within gold nanoparticles. Current approaches to finding such mph{subgroups} require pre-discretized predictive variables, do not permit non-trivial target distributions, do not scale to large datasets, and struggle to find diverse results.
To address these limitations, we propose Syflow, an end-to-end optimizable approach in which we leverage normalizing flows to model arbitrary target distributions, and introduce a novel neural layer that results in easily interpretable subgroup descriptions. We demonstrate on synthetic and real-world data, including a case study, that Syflow reliably finds highly exceptional subgroups accompanied by insightful descriptions.
Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence. In: Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2024. (spotlight, 3.5% acceptance rate; 27.5% overall) |
|
Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence. Technical Report 2402.12930, arXiv, 2024. |