Topics in Algorithmic Data Analysis 2026


News

more ▾

Course Information

Type Advanced Lecture (6 ECTS)
Lecturer Prof. Dr. Jilles Vreeken
Email vreeken (at) cispa.de
Lectures Thursdays, 10–12 o'clock (sharp) in 0.05 (CISPA, E9.1)
Registration Not necessary, see below
Summary In this advanced course we'll be investigating hot topics in machine learning that the lecturer thinks are cool. This course is for those of you who are interested in Machine Learning, Data Science, Data Mining – or, as the lecturer prefers to call it – Algorithmic Data Analysis. We'll be looking into how to discover significant and useful patterns from data, how to gain insight into complex neural models, as well as how to learn inherently interpretable models from complex data.

Registration

There is no need to register for the course with the lecturer.

As per usual, you will have to register for the exam via LSF. You can do so up to one week before the exam.

Schedule

Month Day Topic Slides Assignment Req.
Reading
Opt.
Reading
Apr 9 Introduction and Practicalities PDF 1st assignment out [?]
16 Useful Patterns [1] [8,9,10]
23 Insightful Patterns deadline 1st, 2nd out [2] [11,12,13]
30 Actionable Patterns [3] [14,15,16]
May 7 Jilles travelling – no class
12* Causality deadline 2nd, 3rd out [4] Ch 1, Ch 6 [17,18,19]
21 Causal Discovery [4] Ch 2, Ch 7 [20,21,22]
28 Causal Insight [5] [23,24,25]
Jun 4 yay holiday – no class deadline 3rd, 4th out
11 Explainability [6] [26,27]
18 Interpretability [7] [28,29,30]
25 Surprise deadline 4th
30* Wrap-Up
Jul 22 oral exams
Oct 8 oral re-exams

* Lecture on a different day

All report deadlines are on the indicated day at 10:00.

Prerequisites

Students should have basic working knowledge of machine learning, data mining, and/or statistics, e.g. by successfully having taken courses such as Machine Learning, Probabilistic Graphical Models, Probabilistic Machine Learning, Elements of Machine Learning, etc.

The skills you will benefit most from are critical thought and reading comprehension.

Lectures

TADA will be taught in-person in the CISPA lecture hall (room 0.05 of E9.1).

The credentials to access the course materials will be shared during the first lecture.

Assignments

Students will individually do one assignment per topic – four in total. For every assignment, you will have to read one or more research papers and hand in a report that critically discusses this material and answers the assignment questions. Reports should summarise the key aspects, but more importantly, should include original and critical thought that show you have acquired a meta level understanding of the topic – plain summaries will not suffice. The expected length of a report is 4 pages, but there is no limit. All sources you've drawn from should be referenced. As the point of the course is to develop your critical reading and thinking skills it is strictly prohibited to use LLMs to write (parts of) your report.

The deadlines for the reports are on the day indicated in the schedule at 10:00 Saarbrücken standard-time. You are free to hand in earlier. In case of confusion, the deadline mentioned on the assignment page is leading.

You will find some well-graded example reports here.

Grading and Exam

The assignments will be graded in scale of Fail, Pass, Very Good, and Excellent. Any assignment not handed in by the deadline is automatically considered Failed. You are allowed to re-do one Failed assignment: you have to hand in the improved assignment within two weeks. If the improved assignment is not at least a Pass, you are no longer eligible to take the exam.

An Excellent score gives you one bonus point, as do two Very Goods. Each bonus point improves a passing exam grade by 1/3, up to a maximum improvement of a full mark. For example, if you gathered two bonus points and score a 2.0 in the final exam, your final grade will be 1.3. If, however, you fail the final exam, you fail the course, irrespective of bonus points. Provided you are eligible to sit the final exam, previously Failed assignments do not reduce your final grade.

If your report does not make it sufficiently clear that you did or did not understand the topic at hand (e.g. because you used an LLM to write your report) we will invite you for an in-depth, in-person discussion on the topic of your report. Failure to convince us of sufficient understanding will result in a Fail.

The final exams will be oral, and will cover all the material discussed in the lectures and the topics on which you did your assignments. The preliminary dates for the exams (subject to travel by the lecturer) are as follows. The main exam will most likely be on July 22nd, 23rd, and 24th. The final dates will be announced in the first weeks of the course. The re-exam will most likely be on October 8th, with October 7th as a back-up.

The exact time slot per student will be announced per email. Inform the lecturer of any potential clashes as soon as you know them.

Materials

All required and optional reading will be made available here. You will need a username and password that will be given out in the first lecture.

In case you do not have a strong enough background in data mining, machine learning, or statistics, these books [31,32,4,33,34] may help to get you on your way. The university library kindly keeps hard copies of these books available in a so-called Semesteraparat.

Required Reading

[1] van Leeuwen, M. & Vreeken, J. Mining and Using Sets of Patterns through Compression. In Frequent Pattern Mining, Aggarwal, C. & Han, J., pages 165-198, Springer, 2014.
[2] Fischer, J. & Vreeken, J. Sets of Robust Rules, and How to Find Them. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Springer, 2019.
[3] Atzmueller, M. Subgroup Discovery. WIRE's Data Mining and Knowledge Discovery, 5:35-49, Wiley, 2015.
[4] Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference. MIT Press, 2017.
[5] Kaltenpoth, D. & Vreeken, J. Causal Discovery with Hidden Confounders. In Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2023.
[6] Lundberg, S.M. & Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 4768-77, Curran, 2017.
[7] Xu, S., Walter, N.P. & Vreeken, J. Neural Rule Lists: Learning Discretizations, Rules, and Order in One Go. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), Curran, 2025.

Optional Reading

[8] Vreeken, J., van Leeuwen, M. & Siebes, A. Krimp: Mining Itemsets that Compress. Data Mining and Knowledge Discovery, 23(1):169-214, Springer, 2011.
[9] Budhathoki, K. & Vreeken, J. The Difference and the Norm -- Characterising Similarities and Differences between Databases. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Springer, 2015.
[10] Smets, K. & Vreeken, J. The Odd One Out: Identifying and Characterising Anomalies. In Proceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ, pages 804-815, Society for Industrial and Applied Mathematics (SIAM), 2011.
[11] Fischer, J., Oláh, A. & Vreeken, J. What's in the Box? Exploring the Inner Life of Neural Networks with Robust Rules. In Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2021.
[12] Fischer, J. & Vreeken, J. Differentiable Pattern Set Mining. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2021.
[13] Walter, N.P., Fischer, J. & Vreeken, J. Finding Interpretable Class-Specific Patterns through Efficient Neural Search. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2024.
[14] Boley, M., Goldsmith, B.R., Ghiringhelli, L. & Vreeken, J. Uncovering Structure-Property Relationships of Materials by Subgroup Discovery. New Journal of Physics, 19, IOP Publishing Ltd and Deutsche Physikalische Gesellschaft, 2017.
[15] Xu, S., Walter, N.P., Kalofolias, J. & Vreeken, J. Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence. In Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2024.
[16] Al Rahwanji, .J., Xu, S., Walter, N.P. & Vreeken, J. Learning and Naming Subgroups with Exceptional Survival Characteristics. arXiv, 2026.
[17] Chickering, D.M. Optimal Structure Identification With Greedy Search. JMLR, 3:507-554, 2002.
[18] Colombo, D. & Maathuis, M. Order-independent Constraint-based Causal Structure Learning. Journal of Machine Learning Research, 15(1):3741-3782, 2014.
[19] Zheng, X., Aragam, B., Ravikumar, P. & Xing, E.P. DAGs with NO TEARS: Continuous Optimization for Structure Learning. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), PMLR, 2018.
[20] Marx, A. & Vreeken, J. Identifiability of Cause and Effect using Regularized Regression. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), ACM, 2019.
[21] Mian, O., Marx, A. & Vreeken, J. Discovering Fully Directed Causal Networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2021.
[22] Xu, S., Mameche, S. & Vreeken, J. Information-Theoretic Causal Discovery in Topological Order. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2025.
[23] Kaltenpoth, D. & Vreeken, J. Identifying Selection Bias from Observational Data. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2023.
[24] Mameche, S., Kaltenpoth, D. & Vreeken, J. Learning Causal Models under Indepdendent Changes. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), PMLR
[25] Mameche, S., Kalofolias, J. & Vreeken, J. Causal Mixture Models: Characterization and Discovery. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), Curran, 2025.
[26] Xu, S., Cueppers, J. & Vreeken, J. Succinct Interaction-Aware Explanations. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), ACM, 2025.
[27] Walter, N.P., Vreeken, J. & Fischer, J. Hidden in Plain Sight - Class Competition Focuses Attribution Maps. arXiv, 2025.
[28] Sutton, C., Boley, M., Ghiringhelli, L., Rupp, M., Vreeken, J. & Scheffler, M. Identifying Domains of Applicability of Machine Learning Models for Materials Science. Nature Communications, 11:1-9, Nature Research, 2020.
[29] Hedderich, M., Fischer, J., Klakow, D. & Vreeken, J. Label-Descriptive Patterns and their Application to Characterizing Classification Errors. In Proceedings of the International Conference on Machine Learning (ICML), PMLR
[30] Wilms, M., Xu, S. & Vreeken, J. Explainable Mixture Models through Differentiable Rule Learning. In Proceedings of the International Conference on Learning Representations (ICLR), OpenReview, 2026.
[31] Wasserman, L. All of Statistics. Springer, 2005.
[32] Aggarwal, C.C. Data Mining - The Textbook. Springer, 2015.
[33] Pearl, J. & Mackenzie, D. The Book of Why. Basic Books, 2018.
[34] Hardt, M. & Recht, B. Patterns, Predictions, and Actions - A story about machine learning. Princeton University Press, 2022.