Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent

Abstract. Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we continuously relax BMF using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that our algorithm converges quickly, recovers the ground truth precisely, and estimates the simulated rank robustly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.

Implementation

the datasets and Julia source code (November 2022) by Sebastian Dalleiger.
the replication package including code and data for Dalleiger & Vreeken (NeurIPS 2022).

Related Publications

Dalleiger, S & Vreeken, J Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent. In: Proceedings of Neural Information Processing Systems (NeurIPS), PMLR, 2022. (25.7% acceptance rate)
Miettinen, P & Vreeken, J mdl4bmf: Minimal Description Length for Boolean Matrix Factorization. Transactions on Knowledge Discovery from Data vol.8(4), pp 1-30, ACM, 2014. (IF 1.68)
Miettinen, P & Vreeken, J Model Order Selection for Boolean Matrix Factorization. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 51-59, ACM, 2011.