Causal Inference

more ▾

SpaceTime: Causal Discovery from Non-Stationary Time Series

Given multiple time series datasets, SpaceTime discovers temporal causal graphs, changepoints, and groups of datasets and time periods with similar causal mechanisms, such as repeating regimes or regions with similar geographical characteristics. More information here.

Mameche, S, Cornanguer, L, Ninad, U & Vreeken, J SpaceTime: Causal Discovery from Non-Stationary Time Series. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2025.

Cascade: Causal Discovery from Event Sequences by Local Cause-Effect Attribution

How can we efficiently learn a fully oriented causal network from event sequence data, permitting both delayed and instantanous effects? With Cascade. More information here.

Cueppers, J, Xu, S, Musa, A & Vreeken, J Causal Discovery from Event Sequences by Local Cause-Effect Attribution. In: Proceedings of Neural Information Processing Systems (NeurIPS), PMRL, 2024.

Learning Causal Networks from Episodic Data

Continent is a framework to discover causal networks from data arriving in batches perennially over time. More information here.

Mian, O, Mameche, S & Vreeken, J Learning Causal Networks from Episodic Data. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2024.

Identifying Confounding from Causal Mechanism Shifts

Coco can identify the existence of (shared) hidden confounders by determining causal mechanisms shifts given data obtained over multiple environments. More information here.

Mameche, S, Vreeken, J & Kaltenpoth, D Identifying Confounding from Causal Mechanism Shifts. In: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2024.

Learning Causal Models under Independent Changes

LINC discovers fully oriented causal graphs from data over multiple environments, non-parametrically identifying which of those share the same mechanism, and that those that behave differently, e.g. because of an intervention. More information here.

Mameche, S, Kaltenpoth, D & Vreeken, J Learning Causal Models under Independent Changes. In: Proceedings of Neural Information Processing Systems (NeurIPS), PMRL, 2023.

Causal Discovery with Hidden Confounders

With cdhc, we can discover causal networks over observed variables X and hidden confounders variables Z. More information here.

Kaltenpoth, D & Vreeken, J Causal Discovery with Hidden Confounders using the Algorithmic Markov Condition. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI), AUAI, 2023.

Nonlinear Causal Discovery with Latent Confounders

With NoCaDiLaC, we can discover causal networks over observed variables X and hidden confounders variables Z. More information here.

Kaltenpoth, D & Vreeken, J Nonlinear Causal Discovery with Latent Confounders. In: Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2023.

Identifying Selection Bias from Observational Data

We show under which conditions and with what methods we can identify whether two continuous variables are subject to selection bias. More information here.

Kaltenpoth, D & Vreeken, J Identifying Selection Bias from Observational Data. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp 8177-8185, AAAI, 2023.

Causal Discovery and Intervention Detection over Multiple Environments

Given data from multiple environments, Orion discovers the fully directed overall causal network as well as tells which environments are subject to what interventions. More information here.

Mian, O, Kamp, M & Vreeken, J Information-Theoretic Causal Discovery and Intervention Detection over Multiple Environments. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp 9171-9179, AAAI, 2023.

Discovering Invariant and Changing Mechanisms from Data

Vario can discover which environments share the same mechanism, as well as those that behave differently, e.g. because of an intervention. More information here.

Mameche, S, Kaltenpoth, D & Vreeken, J Discovering Invariant and Changing Mechanisms from Data. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 1242-1252, ACM, 2022.

Causal Inference in the Presence of Heteroscedastic Noise

Heci infers, with very high accuracy, the most likely direction of causation between two numeric univariate variables even if noise is heteroscedastic. More information here.

Xu, S, Mian, O, Marx, A & Vreeken, J Inferring Cause and Effect in the Presence of Heteroscedastic Noise. In: Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2022.

Discovering Reliable Causal Rules

With Dice, we can efficiently mine reliable causal rules from observational data. More information here.

Budhathoki, K, Boley, M & Vreeken, J Discovering Reliable Causal Rules. In: Proceedings of the SIAM International Conference on Data Mining (SDM), SIAM, 2021.

Discovering Fully Oriented Causal Networks

Based on the Algorithmic Markov Condition, Globe discovers fully oriented causal networks from observational data. More information here.

Mian, O, Marx, A & Vreeken, J Discovering Fully Oriented Causal Networks. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2021.

Identifiability of Cause and Effect using Regularized Regression

We show under which conditions regularized regression can be used to identify cause from effect between pairs of univariate continuous-valued random variables. More information here.

Marx, A & Vreeken, J Identifiability of Cause and Effect using Regularized Regression. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2019.

Telling Causal From Confounded

With CoCa, we can tell whether two continuous variables are causally related, or jointly caused by a hidden confounder. More information here.

Kaltenpoth, D & Vreeken, J We Are Not Your Real Parents: Telling Causal From Confounded by MDL. In: SIAM International Conference on Data Mining (SDM), SIAM, 2019.

Accurate Causal Inference on Discrete Data

With Acid, we can highly robustly infer the correct causal direction between two univariate discrete variables using stochastic complexity. More information here.

Budhathoki, K & Vreeken, J Accurate Causal Inference on Discrete Data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'18), IEEE, 2018.

Telling Cause from Effect using Local and Global Regression

Slope infers, with very high accruacy, the most likely direction of causation between two numeric univariate variables based on local and global regression. More information here.

Marx, A & Vreeken, J Telling Cause from Effect by Local and Global Regression. Knowledge and Information Systems vol.60(3), pp 1277-1305, IEEE, 2019.

Causal Inference on Multivariate Mixed-Type Data

We propose the Crack algorithm for identifying the most likely direction of causation between two univariate or multivariate variables of single or mixed-type data. More information here.

Marx, A & Vreeken, J Causal Inference on Multivariate and Mixed Type Data. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data (ECMLPKDD), Springer, 2018.

Causal Inference on Event Sequences

With CuTe, we can robustly infer the correct causal direction between two event sequences using sequential normalized maximum likelihood. More information here.

Budhathoki, K & Vreeken, J Causal Inference on Event Sequences. In: Proceedings of the SIAM Conference on Data Mining (SDM), pp 55-63, SIAM, 2018.

MDL for Causal Inference on Discrete Data

With CiSC, we can highly robustly infer the correct causal direction between two univariate discrete variables using stochastic complexity. More information here.

Budhathoki, K & Vreeken, J MDL for Causal Inference on Discrete Data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'17), pp 751-756, IEEE, 2017.

Causal Inference by Compression

We propose the Origo algorithm for identifying the most likely direction of causation between two univariate or multivariate discrete nominal or binary variables. More information here.

Budhathoki, K & Vreeken, J Origo: Causal Inference by Compression. Knowledge and Information Systems vol.56(2), pp 285-307, Springer, 2018.

Pattern Mining

more ▾

What Are the Rules? Discovering Constraints from Data

We propose UrPiLs to discover constraints for optimization problems and AI planning from exemplary solutions. More information here.

Wiegand, B, Klakow, D & Vreeken, J What are the Rules? Discovering Constraints from Data. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2024.

Finding Interpretable Class-Specific Patterns through Efficient Neural Search

In this paper we propose DiffNaps, a differentiable rather than a combinatorial approach to discovering differential pattern sets. DiffNaps scales extremely well in both n and m, naturally handles noise, and copes equally well with sparse and dense data. More information here.

Walter, NP, Fischer, J & Vreeken, J Finding Interpretable Class-Specific Patterns through Efficient Neural Search. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2024.

Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent

elBMF is a highly scalable and very accurate approach to Boolean matrix factorization. More information here.

Dalleiger, S & Vreeken, J Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent. In: Proceedings of Neural Information Processing Systems (NeurIPS), PMLR, 2022.

Discovering Significant Patterns under Sequential False Discovery Control

Given binary data from one or multiple envirnoments, we show how to discover a succinct and non-redundant set of significant patterns under sequential FWER or FDR. More information here.

Dalleiger, S & Vreeken, J Discovering Significant Patterns under Sequential False Discovery Control. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 263-272, ACM, 2022.

Label-Descriptive Patterns and Their Application to Characterizing Classification Errors

Premise provides actionable insight into when your classifier makes structural errors. More information here.

Hedderich, M, Fischer, J, Klakow, D & Vreeken, J Label-Descriptive Patterns and their Application to Characterizing Classification Errors. In: Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2022.

Differentiable Pattern Set Mining

In this paper we propose BinaPs, a mph{differentiable} rather than a combinatorial approach to pattern set mining that scales extremely well in both \(n\) and \(m\), naturally handles noise, and copes equally well with sparse and dense data. More information here.

Fischer, J & Vreeken, J Differentiable Pattern Set Mining. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 383-392, ACM, 2021.

What's in the Box? Explaining Neural Networks with Robust Rules

With ExplaiNN, we can find robust rules that explain how deep neural networks perceive the world. More information here.

Fischer, J, Oláh, A & Vreeken, J What's in the Box? Explaining Neural Networks with Robust Rules. In: Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2021.

The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery

With Reaper, we can highly efficiently discover high quality pattern sets. More information here.

Dalleiger, S & Vreeken, J The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'20), IEEE, 2020.

Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity

With Mexican, we can efficiently discover pattern sets expressing co-occurrence and mutual exclusivity from discrete data. More information here.

Fischer, J & Vreeken, J Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity . In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2020.

Explainable Data Decompositions

With Disc, we can efficiently discover the pattern composition of a binary dataset. More information here.

Dalleiger, S & Vreeken, J Explainable Data Decompositions. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'20), AAAI, 2020.

Sets of Robust Rules, and How to Find Them

Grab discovers succinct, non-redundant and highly characteristic sets of rules and patterns from binary data. More information here.

Fischer, J & Vreeken, J Sets of Robust Rules, and How to Find Them. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data (ECMLPKDD), Springer, 2019.

The Difference and the Norm

Suppose we are given a set of databases, such as sales records over different branches. How can we characterise the differences and the norm between these datasets? What are the patterns that characterise the overall distribution, and what are those that are important for the individual datasets? That is exactly what the DiffNorm algorithm reveals. More information here.

Budhathoki, K & Vreeken, J The Difference and the Norm – Characterising Similarities and Differences between Databases. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pp 206-223, Springer, 2015.

Efficient Discovery of the Most Interesting Associations

Self-sufficient itemsets are an effective way to summarise the key associations in data. However, their computation appears demanding, as assessing whether an itemset is self-sufficient requires consideration of all pairwise partitions of an itemset, as well as all its supersets. We propose an branch-and-bound algorithm that employs two powerful pruning techniques to extract them efficiently. More information here.

Webb, G & Vreeken, J Efficient Discovery of the Most Interesting Associations. Transactions on Knowledge Discovery from Data vol.8(3), pp 1-31, ACM, 2014.

Subgroup Discovery

more ▾

Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence

We propose Syflow, an end-to-end optimizable approach to discover exceptional subpopulations from data in which we leverage normalizing flows to model arbitrary target distributions, and introduce a novel neural layer that results in easily interpretable subgroup descriptions. More information here.

Xu, S, Walter, NP, Kalofolias, J & Vreeken, J Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence. In: Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2024.

Naming the most Anomalous Cluster in Hilbert Space

We consider the problem of finding subsets from the data that accept a simple description, but also exhibit anomalous behaviour, as seen by a positive definite kernel. This enables us to put a name on subsets of entities that stand out, each of which can have arbitrary structure, like being a graph, image, time-series, chemical, etc. More information here.

Kalofolias, J & Vreeken, J Naming the most anomalous cluster in Hilbert Space for structures with attribute information. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2022.

Discovering Robustly Connected Subgraphs with Simple Descriptions

We consider the problem of discovering robustly connected subgraphs that have simple descriptions. Our aim is, hence, to discover vertex sets which not only a) induce a subgraph that is difficult to fragment into disconnected components, but also b) can be selected from the entire graph using just a simple conjunctive query on their vertex attributes. More information here.

Kalofolias, J, Boley, M & Vreeken, J Discovering Robustly Connected Subgraphs with Simple Descriptions. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), IEEE, 2019.

Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

We argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable: when the distribution of this control variable is the same, or almost the same, as over the whole data. We give an efficient algorithm to find such subgroups in the case of a numeric target and binary control variable. More information here.

Kalofolias, J, Boley, M & Vreeken, J Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'17), IEEE, 2017.

Flexibly Mining Better Subgroups

In subgroup discovery, discovering discover high quality one-dimensional subgroups as well as high quality refinements is a crucial task. For nominal attributes this is easy, but for numerical attributes this is more challenging. We propose to use optimal binning to find high quality binary features for numeric and ordinal attributes. More information here.

Nguyen, H-V & Vreeken, J Flexibly Mining Better Subgroups. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 585-593, SIAM, 2016.

Efficiently Discovering Unexpected Pattern Co-Occurrences

Anomalies are often characterised as the absence of patterns. We observe that the co-occurrence of patterns can also be anomalous – many people prefer Coca Cola, while others prefer buy Pepsi Cola, and hence anyone who buys both stands out. We formally introduce this new class of anomalies, and propose UpC, an efficient algorithm to discover these anomalies in transaction data. More information here.

Bertens, R, Vreeken, J & Siebes, A Efficiently Discovering Unexpected Pattern-Co-Occurrences. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 126-134, SIAM, 2017.

Explainability

more ▾

Succinct Interaction-Aware Explanations

With iShap, we partition the features into significantly interacting groups, and use these to compose succinct, interpretable, additive explanations of black box machine learning-based decisions. More information here.

Xu, S, Cueppers, J & Vreeken, J Succinct Interaction-Aware Explanations. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2025.

Discovering Functional Dependencies from Mixed-Type Data

Given a database and a target attribute, we are after telling whether there exists a functional, or approximately functional dependency of the target on any set of other attributes in the data, regardless of whether these are nominal or continuous valued, to do so efficiently, as well as reliably, without bias to sample size or dimensionality. To this end we propose the MixDora algorithm. More information here.

Mandros, P, Kaltenpoth, D, Boley, M & Vreeken, J Discovering Functional Dependencies from Mixed-Type Data. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2020.

Discovering Reliable Dependencies from Data

Given a database and a target attribute, we are after telling whether there exists a functional, or approximately functional dependency of the target on any set of other attributes in the data, to do so efficiently, as well as reliably, without bias to sample size or dimensionality. To this end we propose the Fedora algorithm. More information here.

Mandros, P, Boley, M & Vreeken, J Discovering Dependencies with Reliable Mutual Information. Knowledge and Information Systems vol.62, pp 4223-4253, Springer, 2020.

Discovering Reliable Correlations in Categorical Data

In this paper we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, by which we obtain a reliable, naturally inpretable, non-parametric measure for correlation over multivariate sets of categorical variables. We also propose an efficient algorithm for discovering reliable correlations. More information here.

Mandros, P, Boley, M & Vreeken, J Discovering Reliable Correlations in Categorical Data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'19), IEEE, 2019.

Discovering Reliable Approximate Functional Dependencies

Given a database and a target attribute, we are after telling whether there exists a functional, or approximately functional dependency of the target on any set of other attributes in the data, to do so efficiently, as well as reliably, without bias to sample size or dimensionality. To this end we propose the Dora algorithm. More information here.

Mandros, P, Boley, M & Vreeken, J Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'18), IEEE, 2018.

Sequence Mining

more ▾

TADAM: Learning Timed Automata from Noisy Observations

TADAM learns probabilistic timed automata from noisy event logs. More information here.

Cornanguer, L & Gimenez, P-F TADAM: Learning Timed Automata from Noisy Observations. In: SIAM International Conference on Data Mining (SDM), SIAM, 2025.

FlowChronicle: Synthetic Network Flow Generation through Pattern Set Mining

We study how to discover sequential patterns from network flow data, and how to use these to generate high quality synthetic network flow data. More information here.

Cueppers, J, Schoen, A, Blanc, G & Gimenez, P-F FlowChronicle: Synthetic Network Flow Generation through Pattern Set Mining. In: Proceedings of the ACM International Conference on Emerging Networking Experiments and Technologies (CoNEXT), ACM, 2024.

Data is Moody: Discovering Data Modification Rules from Process Event Logs

We propose Moody to find accurate, yet succinct and interpretable if-then rules how a business process modifies event data More information here.

Schuster, MB, Wiegand, B & Vreeken, J Data is Moody: Discovering Data Modification Rules from Process Event Logs. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data (ECMLPKDD), Springer, 2024.

Hopper: Mining Sequential Patterns with Reliable Prediction Delays

How can we discover patterns from sequential data that are reliable in terms of, as well as give insight into the delay distributions between their events? With Hopper we can. More information here.

Cueppers, J, Krieger, P & Vreeken, J Discovering Sequential Patterns with Predictable Inter-Event Delays. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2024.

Summarizing Event Sequences with Generalized Sequential Patterns

We study how to discover sequential patterns that may include both observed surface-level as well as generalized events. In particular, we show how to discover good pattern sets and generalizations without requiring prior knowledge. More information here.

Cueppers, J & Vreeken, J Below the Surface: Summarizing Event Sequences with Generalized Sequential Patterns. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2023.

Why Are We Waiting? Interpretable Models for Predicting Waiting and Sojourn Times

We propose CueMin for discovering queueing models that explain and predict waiting and sojourn times. More information here.

Wiegand, B, Klakow, D & Vreeken, J Why Are We Waiting? Discovering Interpretable Models for Predicting Sojourn and Waiting Times. In: SIAM International Conference on Data Mining (SDM), SIAM, 2023.

Discovering Interpretable Data-to-Sequence Generators

We propose Consequence for discovering accurate, yet easily understandable models for predicting event sequences from meta-data. More information here.

Wiegand, B, Klakow, D & Vreeken, J Mining Interpretable Data-to-Sequence Generators. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2022.

Mining Easily Understandable Models from Complex Event Data

We propose Proseqo for discovering accurate, yet easily understandable models from complex event sequence data. More information here.

Wiegand, B, Klakow, D & Vreeken, J Mining Easily Understandable Models from Complex Event Data. In: SIAM International Conference on Data Mining (SDM), SIAM, 2021.

Omen: Mining Sequential Patterns with Reliable Prediction Delays

How can we discover patterns that are not just reliable in that they accurately predict that something of interest will happen, but also reliable in that they can tell us when this will happen? With Omen we can. More information here.

Cueppers, J, Kalofolias, J & Vreeken, J Omen: Discovering Sequential Patterns with Reliable Prediction Delays. Knowledge and Information Systems vol.64(4), pp 1013-1045, Springer, 2022.

Causal Inference on Event Sequences

With CuTe, we can robustly infer the correct causal direction between two event sequences using sequential normalized maximum likelihood. More information here.

Budhathoki, K & Vreeken, J Causal Inference on Event Sequences. In: Proceedings of the SIAM Conference on Data Mining (SDM), pp 55-63, SIAM, 2018.

Efficiently Summarising Event Sequences with Rich Interleaving Patterns

We consider mining informative serial episodes — subsequences allowing for gaps — from event sequence data. We formalize the problem by the Minimum Description Length principle, and give algorithms for selecting good pattern sets from candidate collections as well as for parameter free mining of such models directly from data. More information here.

Bhattacharyya, A & Vreeken, J Efficiently Summarising Event Sequences with Rich Interleaving Patterns. In: Proceedings of the SIAM Conference on Data Mining (SDM), pp 795-803, SIAM, 2017.

Keeping it Short and Simple

We study how to obtain concise descriptions of discrete multivariate sequential data in terms of rich multivariate sequential patterns. We introduce Ditto, and show it discovers succinct pattern sets that capture highly interesting associations within and between sequences. More information here.

Bertens, R, Vreeken, J & Siebes, A Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16), pp 735-744, ACM, 2016.

Linear-time Detection of Non-linear Changes

Detecting whether any important statistics over your time series changed is an important aspect of time series analysis. With Light, we tackle the problem of efficiently and effectively detecting non-linear changes over very high dimensional time series. More information here.

Nguyen, H-V & Vreeken, J Linear-time Detection of Non-Linear Changes in Massively High Dimensional Time Series. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 828-836, SIAM, 2016.

Graph Mining

more ▾

From Your Block to Our Block: How to Find Shared Structure between Stochastic Block Models

sSBM models multiple graphs using a stochastic block model that enables it to find out any shared structure. More information here.

Kumpulainen, I, Dalleiger, S, Vreeken, J & Tatti, N From Your Block to Our Block: How to Find Shared Structure between Stochastic Block Models over Multiple Graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2025.

Differentially Describing Groups of Graphs

Given a set of graphs and a partition of these graphs into groups, we aim to discover what graphs in a group have in common, how they systematically differ from graphs in other groups, and how multiple groups of graphs are related. More information here.

Coupette, C, Dalleiger, S & Vreeken, J Differentially Describing Groups of Graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2022.

We treat graph similarity assessment as a description problem, rather than as a measurement problem. Having formalized this problem as a model selection task using the Minimum Description Length principle, we propose Momo (Model of models), which solves the problem by breaking it into two parts and introducing efficient algorithms for each. More information here.

Coupette, C & Vreeken, J Graph Similarity Description: How Are These Graphs Similar?. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 185-195, ACM, 2021.

Susan: The Structural Similarity Random Walk Kernel

We propose Susan, an efficient to compute random walk graph kernel that picks up structural similarity. More information here.

Kalofolias, J, Welke, P & Vreeken, J SUSAN: The Structural Similarity Random Walk Kernel. In: Proceedings of the SIAM International Conference on Data Mining (SDM), SIAM, 2021.

What is Normal, What is Strange, and What is Missing in a Knowledge Graph

We introduce a unified solution to knowledge graph characterization by formulating the problem as unsupervised summarization with a set of inductive, soft rules, which describe what is normal, and thus can be used to identify what is abnormal, whether it be strange or missing. More information here.

Belth, C, Zheng, X, Vreeken, J & Koutra, D What is Normal, What is Strange, and What is Missing in a Knowledge Graph. In: Proceedings of the Web Conference (WWW), ACM, 2020.

Reconstructing an Epidemic over Time

With CulT, we propose a method to reconstruct an epidemic over time, or, more general, reconstructing the propagation of an activity in a network. More information here.

Rozenshtein, P, Gionis, A, Prakash, BA & Vreeken, J Reconstructing an Epidemic over Time. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp 1835-1844, ACM, 2016.

Facets: Adaptive Local Exploration of Large Graphs

We propose Facets, a new scalable approach that helps users adaptively explore large million-node graphs from a local perspective, guiding them to focus on nodes and neighborhoods that are most subjectively interesting to users. More information here.

Pienta, R, Kahng, M, Lin, Z, Vreeken, J, Talukdar, P, Abello, J, Parameswaran, G & Chau, DH Adaptive Local Exploration of Large Graphs. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 597-605, SIAM, 2017.

Summarizing and Understanding Large Graphs

Measuring the difference between data mining results is an important open problem in exploratory data mining. We discuss an information theoretic approach for measuring how much information is shared between results, and give a proof of concept for binary data. More information here.

Koutra, D, Kang, U, Vreeken, J & Faloutsos, C VoG: Summarizing and Understanding Large Graphs. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 91-99, SIAM, 2014.

Federated Learning

Federated Binary Matrix Factorization using Proximal Optimization

FedBMF makes it possible to learning high quality binary matrix factorizations in a federated manner. More information here.

Dalleiger, S, Vreeken, J & Kamp, M Federated Binary Matrix Factorization using Proximal Optimization. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2025.

FedDC: Federated Learning from Small Datasets

FedDC makes it possible to learning high quality models in a federated manner when every site only has very litte training data available. More information here.

Kamp, M, Fischer, J & Vreeken, J Federated Learning from Small Datasets. In: Proceedings of the International Conference on Representation Learning (ICLR), OpenReview, 2023.

Regret-based Privacy-Preserving Federated Causal Discovery

Peri discovers causal networks from observational data in a privacy-preserving and federated manner exchanging nothing but regret values. More information here.

Mian, O, Kaltenpoth, D, Kamp, M & Vreeken, J Nothing but Regrets — Privacy-Preserving Federated Causal Discovery. In: Proceedings of the 26nd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2023.