## Exploratory Data Analysis.Data is big. We dig it.

How can we discover novel and interesting things from data?
How can we obtain inherently interpretable models?
How can we draw reliable causal conclusions?

That's exactly what we develop theory and algorithms for.

### Sascha joins EDA as a PhD student

Warm welcome to Sascha Xu as a new PhD student in the EDA group! Sascha recently finished his MSc thesis at DFKI while working with us as a HiWi. He now joins EDA to pursue his PhD. Last year he already presented a paper at ICML. For his PhD he will probably continue his exploration of the wonderful world of causation and interpretability. Welcome, Sascha!

(1 Jan 2023)

### CueMin to be presented at SDM 2023

Boris will present his paper on learning interpretable models for predicting waiting and sojourn times accepted for presentation at the 2023 SIAM International Conference on Data Mining (SDM). In this paper, he proposes the MDL-based Cuemin algorithm that automatically determines the type and parameterization of the queing behaviour in the observed data. Extensive experiments show that Cuemin is not only more generally applicable, but that it performs on par with specialized solutions that require much more knowledge, as well as generalizes and extrapolates better than the state of the art. Congratulations Boris!

(26 Dec 2022)

### Janis Kalofolias is now a Doctor of Natural Sciences

Thursday, December 8th, 2022, Janis Kalofolias succesfully defended his Ph.D. thesis titled 'Subgroup Discovery for Structured Targets'. The promotion committee, consisting of Profs. Raimund Seidel, Gerhard Weikum, Peter Flach, and Jilles Vreeken, were impressed with the thesis, presentation, and discussion and decided that Janis passed the requirements for a degree of Doctor of Natural Sciences with the distinction Magna Cum Laude. Congratulations, Dr. rer. nat. Kalofolias!

(8 Dec 2022)

### Two papers at AAAI 2023

David and Osman had their papers accepted for presentation at the 2023 AAAI International Conference on Artificial Intelligence (AAAI). David will present theory and methods for identifying whether a dataset has been subject to selection bias that if we disregard this could thwarth our causal analysis. Osman will present Orion for identifying directed causal graphs as well as interventions thereupon from data drawn from multiple enviroments. Congratulations to both!

(1 Dec 2022)

### elBMF to be presented at NeurIPS 2022

Sebastian will present elBMF, a novel and highly scalable approach to Boolean matrix factorization at NeurIPS 2022 in New Orleans. The secret ingredient elBMF is that it uses continuous rather than combinatorial optimization. To ensure that the results are Boolean, Sebastian introduces an elastic-net-like regularizer, which has the benefit that no post-processing (Booleanification) of the results are necessar. You can find the paper and implementation here. Congratulations Sebastian!

(15 Sep 2022)

### Jonas Fischer is now a Doctor of Natural Sciences

Thursday, July 28th, 2022, Jonas Fischer succesfully defended his Ph.D. thesis titled 'More than the Sum of its Parts — Pattern Mining, Neural Networks, and How They Complement Each Other'. The promotion committee, consisting of Profs. Sven Rahmann, Gerhard Weikum, Srinivasan Parthasararthy, and Jilles Vreeken, were deeply impressed with the thesis, presentation, and discussion and decided that Jonas therewith passed the requirements for a degree of Doctor of Natural Sciences with the distinction Summa Cum Laude. Congratulations, Dr. rer. nat. Fischer!

(28 Jul 2022)

### Two papers at KDD 2022

EDA will present two papers at the ACM International Conference on Knowledge Discovery and Data Mining Learning (KDD). Sarah and David will present Vario for discovering which environments share an invariant mechanism, for determining what those mechanisms are, what the causal parents are, and how to use $$\pi$$-invariance for causal discovery over multiple environments. Sebastian will have a lot of fun presenting Spass, which is a novel approach to discovering patterns that are significantly associated with one or multiple class labels, while controlling for multiple-hypothesis testing using either FDR or FWER. Congratulations to all!

(19 May 2022)

### Two papers at ICML 2022

EDA will present two papers at the International Conference on Machine Learning (ICML). Sascha will present Heci which is an effective method for determining cause from effect when noise is heteroscedastic. Together with Osman and Alex he proposed a causal model that permits non-stationary noise, determined the conditions under which it is identifiable, and give an effective algorithm based on dynamic programming. Jonas and Michael will present Premise for characterizing when an arbitrary complex classifier goes wrong in easily interpretable terms. As they show, the patterns they find are not only insightful but also actionable, allowing to improve classifiers by targeted fine-tuning. Congratulations to all!

(15 May 2022)

### Three papers at AAAI 2022

Great success for the EDA group — three papers accepted for presentation at the 2022 AAAI International Conference on Artificial Intelligence (AAAI). Boris will present Consequence for mining J interpretable data-to-sequence generators. Corinna and Sebastian will present Gragra for describing what is common and what is different between groups of graphs. Janis will present Nuts for kernelized subgroup discovery, or, more technically, naming the most anomalous cluster in Hilbert Space for structures with attribute information. Congratulations to all!

(1 Dec 2021)

### Alexander Marx is now a Doctor of Natural Sciences

Tuesday June 29th 2021, Alexander Marx succesfully defended his Ph.D. thesis titled 'Information-Theoretic Causal Discovery'. The promotion committee, consisting of Profs. Isabel Valera, Gerhard Weikum, Thijs van Ommen, and Jilles Vreeken, were impressed with the thesis, presentation, and discussion and decided he passed the requirements for a degree of Doctor of Natural Sciences with the distinction Magna Cum Laude. Congratulations, Dr. rer. nat. Marx!

(29 Jun 2021)

### Four papers at KDD, ICML, and UAI 2021

Three conferences, four papers: at ICML, Jonas and Anna will present ExplaiNN for exploring how information is encoded within, and flows through, a deep convolutional neural network. At KDD, Jonas will present BinaPs for mining high-quality pattern sets from high dimensional data using a special binarized auto-encoder. Corinna will present Momo for describing the similarity between two (partially aligned) graphs in easily understandable terms. Alex will present his work with Joris Mooij and Arthur Gretton on the more realistic 2-adjacency faithfulness assumption at UAI. Congratulations to all!

(19 May 2021)

### Jana makes Causal Discovery more Realistic

It is impossible to draw causal conclusions from data alone; we also need to make assumptions on the data generating process. Faithfulness is the assumption that if there exists a dependency between two variables in the process, these two variables are also dependent in the data. Jana shows that XOR-like dependencies, which are of great interest in biological applications, are hence not detectable by any algorithm that assumes faithfulness! To save the day, she shows how we can discover Markov blankets and causal networks under the more realistic assumption of 2-adjacency faithfulness, which allow her to discover XOR-like dependencies in biological data that existing algorithms miss. Congratulations, Jana!

(14 May 2021)

### Panagiotis Mandros is now a Doctor of Engineering

On Thursday March 4th, Panagiotis Mandros succesfully defended his Ph.D. thesis titled 'Discovering Robust Dependencies from Data'. The promotion committee consisting of Profs. Dietrich Klakow, Gerhard Weikum, Geoff Webb, and Jilles Vreeken, decided that he not only passed the requirements for a degree of Doctor of Engineering but also awarded his thesis with the distinction Summa Cum Laude. Congratulations, Dr.-Ing. Mandros!

(4 Mar 2021)

### Four papers accepted for presentation at SDM 2021

Great success for the EDA group: we got four papers accepted for presentation at the 2021 SIAM International Conference on Data Mining (SDM). Alex will present his joint work with Lincen Yang on estimating conditional mutual information for discrete-continuous mixtures. Boris will present ProSeqo for mining concise yet powerful models from event sequence data. Janis will present Susan, the structural similarity random walk kernel that he developed together with Pascal Welke. Last, but not least, Kailash will present Dice for mining reliable causal rules. Congratulations to all!

(22 Dec 2020)

### Globe to be presented at AAAI 2021

Osman and Alex will present their work on discovering fully oriented causal networks at next year's AAAI, the International Conference on Artificial Intelligence. In their work, they propose a score-based causal discovery algorithm that builds upon the algorithmic Markov condition to automatically orient all edges in the most likely causal direction. The Globe algorithm is remarkably robust and outperforms state-of-the-art score and constraint based solutions.

(2 Dec 2020)

### Two papers accepted for presentation at IEEE ICDM 2020

Joscha and Sebastian will present two papers at this year's IEEE International Conference on Data Mining. Joscha's will present Omen for discovering patterns that do not only predict that something of interest will happen, but are also reliable in telling when this will be. Sebastian proposes Reaper, a new relaxed formulation of the Maximum Entropy distribution that, through dynamic factorizations, is as accurrate yet orders of magnitudes faster than traditional approaches – so enabling principled discovery of highly informative patterns from much larger and much more complex data than ever before.

(21 Aug 2020)

### Anna explores the secret life of neural networks

With ExplaiNN, Anna Oláh proposes a highly scalable method that provides deep insight into the black box that neural networks are. In her Master thesis, Anna proposes to mine activation patterns between neurons in different layers in the form of robust rules. Not only does she propose an efficient and highly scalable algorithm, she also shows how we can use these rules to gain insight beyond the state of the art, both in what drives decisions for individual classes, as well as the differences between. Who knew, that in the eye of a neural network, Malamutes are essentially fluffy Husky's, and Husky's are essentially sharply drawn Malamutes! Congratulations, Anna!

(20 Aug 2020)

### Edith shows us what we didn't know yet

In her Master thesis, Edith Heiter studies the problem of how to factor out prior knowledge from low-dimensional embeddings. In other words, how can we visualise a high dimensional dataset, such that we reveal structure that goes beyond what we already knew? In her thesis, Edith proposes not one, but two methods to factor out arbitrary distance matrices. With Jedi she proposes to adapt the objective function of t-SNE in a well-founded manner, while with Confetti she proposes a method that allows us to factor out knowledge from arbitrary embedding algorithms. Through many experiments, she showed that both work well in practice, earning her the title Master of Science. Congratulations, Edith!

(23 Jul 2020)

### Kailash Budhathoki is now a Doctor of Natural Sciences

On Monday July 3rd, Kailash Budhathoki succesfully defended his Ph.D. thesis titled 'Causal Inference on Discrete Data'. The promotion committee consisting of Profs. Dietrich Klakow, Gerhard Weikum, Tom Heskes, and Jilles Vreeken, decided that he not only passed the requirements for a degree of Doctor of Philosophy of the Natural Sciences (Dr.rer.nat.) but also awarded his thesis with the distinction Summa Cum Laude. Congratulations, Dr. Budhathoki!

(3 Jul 2020)

### Three papers accepted at ACM SIGKDD 2020

EDA will present three papers at ACM SIGKDD 2020, the flagship conference in data mining. Jonas will present his work on discovering patterns of mutual exclusivity, in which he proposed the Mexican algorithm. Panagiotis will present his work together with Frederic Penerath on how to use smoothing to measure and mine reliable functional dependencies, as well as work together with David on how to discover functional dependencies from mixed-type data.

(16 May 2020)

### Sandra lays an opinion-spam trap

How can we detect review spam campaigns, the colluding groups of spammers, as well as determine the spamicity of individual reviewers that actively try to hide their spamming behaviour? In her Master thesis, Sandra Sukarieh answers all three questions. The main premise is that a campaign requires multiple users and abnormal scores. Sprap identifies users that surprisingly often review products together with other users that surprisingly often score differently from the norm. Experiments show her method works remarkably well in practice, without even having to consider the content of the reviews. In other words, Sandra's Master thesis campaign was a great success. Congratulations!

(11 May 2020)

### Alex invited to the Heidelberg Laureate Forum

Alexander Marx has been invited to attend the Heidelberg Laureate Forum. While the actual event is postponed to next year due to Corona, he will then get to meet laureates of the most prestiguous awards in Mathematics and Computer Science, such as Turing Award winners Manuel Blum, Vinton Cerf, Richard Karp, and Judea Pearl, as well as 199 other highly talented young scientists.

(24 Apr 2020)

### Joscha joins EDA as a PhD student

Warm welcome to Joscha Cueppers as a PhD student in the EDA group! Joscha recently finished his MSc thesis with us, and now joins to pursue his PhD. He'll be working on statistically well-founded pattern discovery from structured data, such as sequences and graphs, to gain insight in the causal processes that generated this data. Welcome, Joscha!

(1 Apr 2020)

### Corinna joins EDA as a PhD student

We warmly welcome Corinna Coupette as a PhD student in the EDA group! Corinna already holds a PhD in Law, as well as a Masters degree in Informatics. She will be working both on the theory of, as well as on methods for meaningful analysis of complex graphs. The theory aspects she will work on with Christoph Lenzen of the Max Planck Institute for Informatics, while she'll work on method for graph mining with Jilles Vreeken. Welcome, Corinna!

(1 Jan 2020)

### Joscha bakes a Cake

In his Master thesis, Joscha Cueppers considers the problem of discovering patterns that reliably predict future events. That is, he is interested in discovering sequential patterns from an event sequence $$X$$ for which with high accuracy how long it will take until we see an interesting event happening in event sequence $$Y$$. He modelled the problem using MDL, and proposes the Cake algorithm to discover a small set of non-redundant patterns that together predict $$Y$$ as well as possible given $$X$$. The experiments show the results are very tasty. Congratulations, Joscha!

(1 Nov 2019)

### Osman joins EDA as a PhD student

Warm welcome to Osman Ali Mian as a PhD student in the EDA group! Osman recently finished his MSc thesis with us on the topic of discovering fully directed causal networks, and now joins to pursue his PhD. He'll be working on theory and methods for doing causal inference in realistic settings – e.g. methods that scale, can deal with data from multiple sources, can deal with missing data, and so on. Welcome, Osman!

(1 Sep 2019)

### Divyam summarizes temporal graphs with Mango

Suppose we are given multiple snapshots of a graph over time, how can we discover patterns of change and similarity between them? Divyam Saran proposed the MDL-based Mango algorithm to discover succinct and non-redundant summaries that give clear insight in what is happening between the graphs. In a nutshell, he discovers significant structure per graph, and then uses the structures from adjacent graphs to refine the overall temporal summary – identifying growing, shrinking, and changing structures such as cliques, stars, and bi-partite subgraphs. Congratulations, Divyam!

(31 Jul 2019)

### Boris joins EDA as a PhD student

We warmly welcome Boris Wiegand as a PhD student in the EDA group! Boris is employed by the Dillinger steel works, and will work on topics related to extracting high-quality models from production logs – for example, to gain insight in patterns, bottlenecks, as well as to optimize both planning and production. He will be co-supervised by Jilles Vreeken and Dietrich Klakow. Welcome, Boris!

(1 Jul 2019)

### Osman trots the Globe

How can we discover fully oriented causal networks from observational data? In this Master thesis, Osman Ali Mian shows how we can use the Algorithmic Markov Condition to not only discover high quality causal skeletons, but at the same time orient all the edges from cause to effect. To find such networks from data, he proposes Globe, which instantiates the ideal using MDL and non-parametric multivariate regression splines. The experiments show that his proposal outperforms the state of the art constraint-based as well as score-based methods. Congratulations, Osman!

(1 Jul 2019)

### Panagiotis invited to the Heidelberg Laureate Forum

Panagiotis has been invited to attend the Heidelberg Laureate Forum. During the 3rd week of September he will he will then get to meet laureates of the most prestiguous awards in Mathematics and Computer Science, such as Turing Award winners Manuel Blum, Vinton Cerf, Richard Karp, and Yoshua Bengio, as well as 199 other highly talented young scientists.

(24 Apr 2019)

### Simina shows there is more to it than a single answer

While almost all data analysis methods produce a single model, reality is more complex than that. How can we discover not one, but multiple high-quality explanations for a dataset, each of which show increasingly yet significantly more detail than the others? This is exactly the answer that Simina Ana Cotop answers in her Master thesis, in which she proposes the Grim algorithm that instantiates Kolmogorov's structure function for pattern-based summarization. Through many experiments she shows that Grim indeed returns insightful high level as well as detailed in-depth summaries. Congratulations, Simina!

(1 Mar 2019)

### Panos, Mario and Jilles win the IEEE ICDM 2018 Best Paper Award

Out of 948 submissions, the award committee of IEEE ICDM 2018 selected our paper Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms by Panagiotis Mandros, Mario Boley, and Jilles Vreeken for the IEEE ICDM 2018 Best Paper Award! We will receive the award in Singapore on November 19th. Hurray!

(11 Nov 2018)

### Jilles wins the IEEE ICDM Tao Li Award

The IEEE ICDM Tao Li Award recognizes excellent early career researchers for their impact on research contributions, impact, and services within the first ten years of their obtaining their PhD. This inaugural year, the award committee selected Jilles Vreeken for this honour — who is both deeply honoured, and uncharacteristically speechless.

(9 Nov 2018)

### Mario starts Tenure Track at Monash University

While we're very sad that Mario Boley will leave us, we are very happy that on October 1st 2018 he will make the next step in his career and join Monash University in Melbourne, Australia as Tenure Track faculty. We wish Mario all the best, and are looking forward to continue working together on topics such as subgroup and functional dependency discovery. Congratulations, Mario!

(1 Sep 2018)

### Two papers accepted for presentation at IEEE ICDM 2018

Kailash Budhathoki and Panagiotis Mandros will present two papers at IEEE ICDM 2018 in Singapore. Kailash will present his work on accurate causal inference on discrete data, in which he shows that by simply optimising the residual entropy we can accurately identify the most likely causal direction—with guarantees. Panagiotis will present his work on discovering reliable approximate functional dependencies, in which he shows that although this problem is NP-hard, using his optimistic estimator we can solve it exactly in reasonable time, as well as get extremely good solutions using a greedy strategy too.

(18 Aug 2018)

### Iva gives guarantees and fast algorithms for mining patterns that overlap

Iva Farag was unhappy with the fact that Slim was restricted to using patterns without overlap, and looked into the theoretical details as well as the practical algorithmics for how to alleviate this. In her Master thesis, she shows that the problem is related to weighted set cover, and based on this proposes three cover algorithms that do allow overlap, two of which give guarantees on the quality of the solution. Experiments show that with GreCo we find more succinct, more insightful patterns that are less prone to fitting noise. Congratulations, Iva!

(17 Aug 2018)

### Maha smoothly smooths discrete data with Smoothie

With Smoothie, Maha Aburahma proposes a parameter-free algorithm for smoothing discrete data. In short, given a noisy transaction database, the algorithm makes local adjustments such that the overall MDL-complexity of the data and model is minimised. It does so step by step, providing a continuum of increasingly smoothened data. The MDL-optimum coincides with the optimal denoised data, which lends itself for pattern mining and knowledge discovery. Congratulations, Maha!

(16 Aug 2018)

### Yuliia proposes Grip for non-parametric dependency network reconstruction

For her Master thesis, Yuliia Brendel studied how we can recover the dependency network over a multivariate continuous-valued data set, without having to assume anything about the data distribution. She did so using the notion of cumulative entropy, and proposes the Grip algorithm to robustly estimate it for multivariate case. Experiments show that Grip performs very well even for highly non-linear, highly noisy, and high dimensional data and dependencies. Congratulations, Yuliia!

(29 Jun 2018)

### Boris predicts the wear and tear of rolling mills in a steel factory

During his studies, Boris Wiegand worked at the Dillinger steel plant, where among others they use specialized rolling mills to highly precisely turn chunks of red-hot steel into plates of specified thickness. These rolls in these mills undergo incredible temprature and pressure, and hence need to be replaced ever so often. The question is, when? In his Master thesis, Boris proposed a data-driven model that outperforms the industry-standard phsyics-based model, as well as how we can use this to optimize the milling schedule. Congratulations, Boris!

(28 Jun 2018)

### Maike shows how to reverse engineer epidemics in weighted graphs

In her Master thesis, Maike Eissfeller considered the problem of how to identify which nodes were most likely responsible for starting an epidemic in a large, weighted graph. She build upon the NetSleuth algorithm, and showed how to extend the theory to weighted graphs, how to make it more robust against the non-convex score, and how to improve its results by local re-optimization. Congratulations, Maike!

(19 Jun 2018)

### Kailash explains how to be Cute at SDM

Given two discrete valued time series can we tell whether they are causally related? That is, can we tell whether $$x$$ causes $$y$$, or whether $$y$$ causes $$x$$? In the paper he presented on May 3rd at the SIAM Data Mining Conference, Kailash shows we can do so accurately, efficiently, and without having to make assumptions on the distribution of these time series, or about the lag of the causal effect. You can find the paper and implementation here.

(2 May 2018)

### Tatiana shows how to robustly discretizing multivariate data

Tatiana Dembelova received her Master of Science degree for her thesis on how to how to discretize multivariate data such that we maintain the most important interactions between the attributes. In particular, she showed that existing work based on interaction distances performs less well than desired, and proposed a new approach based on footprint interactions that is highly robust against noise and the curse of dimensionality both in theory and in practice. Congratulations, Tatiana!

(13 Mar 2018)

### Robin introduces the Fire approach to discover interesting patterns

Robin Burghartz received his Master of Science degree for his thesis on how to identify interesting non-redundant pattern sets through the use of adaptive codes. Loosely speaking, he showed that when describing a row of data, if we adaptively only consider those patterns we know we can possibly use, instead of all, we can identify those patterns that stand out strongly from those already selected are chosen, leading to much smaller and much less redundant pattern sets. Congratulations, Robin!

(14 Dec 2017)

### Henrik presents Explore to efficiently discover powerlaw communities

Henrik Jilke presented his Master thesis on the efficient discovery of powerlaw-distributed communities in large graphs. He proposed a lossless score based on the Minimum Descrtipion Length principle to identify whether a subgraph stands out sufficiently to be considered a community, and gave the efficient Explore algorithm to heuristically discover the best set of such communities. Experiments validate his method is able to discover large, powerlaw-distributed communities that other methods miss. Congratulations, Henrik!

(7 Dec 2017)

### Benjamin proposes to automatically Refine ontologies for a specific corpus

Benjamin Hättasch finished his Master of Science by handing in his thesis on the automatic refinement of ontologies using compression-based learning. In a nutshell, Benjamn shows how we can efficiently describe a given text using an ontology. His main result is the Refine algorithm, that iteratively refines the ontology such that we maximize the compression. The resulting ontologies are a much better representation of the text distribution, as well as allow him to identify the key topics of the text without supervision. Congratulations, Benjamin!

(4 Dec 2017)

### Jonas receives IMPRS-CS PhD Fellowship

We are happy and proud to announce that Jonas Fischer got accepted as a PhD student in the International Max Planck Research School for Computer Science (IMPRS-CS) to pursue a PhD on the topic of algorithmic data analysis. He was already a student in the Saarbrücken Graduate School of Computer Science, and recently finished his Master thesis in Bioinformatics on the topic of highly efficient methylation calling.

(11 Oct 2017)

### David receives IMPRS-CS PhD Fellowship

We are excited to announce that David Kaltenpoth got accepted as a PhD student in the International Max Planck Research School for Computer Science (IMPRS-CS). He was already a member of the Saarbrücken Graduate School of Computer Science. He will work on the topic of information theoretic causal inference, in particular the theory and practice of determining whether potential causal dependencies are confounded.

(11 Oct 2017)

### Sebastian joins EDA as a PhD student

We warmly welcome Sebastian Dalleiger as a PhD student in the Exploratory Data Analysis group. Sebastian finished his Master's in Informatics at Saarland University in 2016, and will now join our group to work on information theoretic approaches to mining interpretable and useful structure from data.

(2 Sep 2017)

### Janis joins EDA as a PhD student

We warmly welcome Janis Kalofolias as a PhD student in the Exploratory Data Analysis group. Janis recently finished his Master's in Informatics at Saarland University, and will now join our group to work on the theoretical foundations of mining interesting patterns from data.

(7 Nov 2016)

### Alex receives IMPRS-CS PhD Fellowship

We are happy to announce that Alexander Marx got accepted as a PhD student in the International Max Planck Research School for Computer Science (IMPRS-CS) and the Saarbrücken Graduate School of Computer Science! He will work on the efficient discovery and interpretable description of interesting sub-populations in data, with the grand goals of discovering causal dependencies that lead to the discovery of novel materials.

(1 Nov 2016)

### Amir proposes BVCorr to discover non-linearly correlated segments

Amirhossein Baradaranshahroudi finished his Master of Science by handing in his thesis on fast discovery of non-linearly correlated segments in multivariate time series. In his thesis, Amir shows that through fast-fourier transformation, convolution, and pre-computation we can bring down the computational complexity of computing the distance correlation between all pairwise windows in $$O(n^4 \log n)$$ instead of $$O(n^5 d)$$. For discovery in long time series, he proposes an effective and efficient heuristic that only takes $$O(nwd)$$ time. Congratulations, Amir!

(14 Oct 2016)

### Apratim shows how to Squish event sequences

Apratim Bhattacharyya finished his Master of Science by handing in his thesis 'Squish: Efficiently Summarising Event Sequences with Rich and Interleaving Patterns'. Squish improves over the state of the art by considering a much richer description language, allowing both nesting and interleaving of patterns, as well as both variances and partial occurrences of patterns. Moreover, Squish is not only orders of magnitude faster than the state of the art, experiments show it also discovers much better and more easily interpretable models. Congratulations, Apratim!

(30 Sep 2016)

### Beata untangles a pile of spaghetti

Beata Wójciak handed in her thesis 'Spaghetti: Finding Storylines in Large Collections of Documents' on the 29th of September, and so fullfilled the requirements to become a Master of Science in Informatics. In her thesis, Beata studied the problem of making sense from large, time-stamped, collections of documents, and proposed the efficient Spaghetti algorithm to discover the pattern storylines in a corpus. This allows us to draw a map showing which documents are connected, as well as easily interpret the storylines. Congratulations, Beatka!

(29 Sep 2016)

### Magnus combines sketching and Slim into Skim

For this Bachelor thesis, Magnus Halbe studied whether sketching can speed up Slim. In particular, he investigated whether DHP and min-hashing can used to reliably and efficiently identify co-occurring patterns. In this thesis, titled 'Skim: Alternative Candidate Selections for Slim through Sketching', Magnus shows that the answer is 'not really.'. Whereas the sketches ably identify heavy hitters, they are less efficient in identifying more subtle patterns. He therefore proposes the Skim algorithm, that combines the best of both worlds. Congratulations, Magnus!

(28 Sep 2016)

### Roel presents Ditto at KDD

During the summer of 2014 Roel Bertens did an internship in our group. He presented the resulting paper, Keeping it Short and Simple' at ACM SIGKDD 2016. Together with Arno Siebes we studied the problem of finding summaries of complex event sequences in terms of patterns that span over multiple attributes and which may include gaps. We propose the Ditto algorithm, to reliably and efficiently discover succinct and non-redundant models from multivariate event sequences. We give a short explanation, without kittens, on YouTube.

(24 Jul 2016)

### Polina presents CulT at KDD

Last summer, Polina Rozenshtein did an internship in our group. She presented the resulting paper Reconstructing an Epidemic over Time' at ACM SIGKDD 2016. Together with B. Aditya Prakash and Aris Gionis we studied the problem of finding the seed nodes of an epidemic, if we are given an interaction graph, and a sparse and noisy sample of node states over time. We propose the CulT (Culprits in Time) algorithm, that reliably, efficiently, and without making any assumptions on the viral process can recover both the number and location of the original seed nodes. We give a short explanation, with kittens, on YouTube.

(24 Jul 2016)

### Kailash invited to the Heidelberg Laureate Forum

Kailash Budhathoki has been invited to attend the Heidelberg Laureate Forum. During the 18th and 23rd of September 2016, he will get to meet laureates of the most prestiguous awards in Mathematics and Computer Science, such as Turing Award winners Manuel Blum, Vinton Cerf, Richard Karp, and John Hopcroft, as well as 199 other highly talented young scientists.

(24 Apr 2016)

### Panos and Jilles present Flexi, Light, and UdS at SIAM SDM

Panagiotis Mandros presented uds, which allows for Universal Dependency Analysis. That is, it is a robust and efficient measure for non-linear and multivariate correlations, which does not require any prior assumptions, yet does allow for meaningful comparison, no matter the cardinality or distribution of the subspace. Jilles Vreeken presented light, a linear-time method for detecting non-linear change points in massively high dimensional time series, and flexi, a highly flexible method for mining high quality subgroups through optimal discretisation, that works with virtually any quality measure.

(24 Apr 2016)