Topics in Algorithmic Data Analysis 2025

News

more ▾

Aug 12 — re-exam date October 6th, with October 7th as back-up
Jul 10 — slides and recording of twelfth lecture available
Jul 03 — slides and recording of eleventh lecture available
Jun 26 — slides and recording of tenth lecture available
Jun 20 — slides and recording of ninth lecture available , fourth assignment posted
Jun 12 — ninth lecture will be on Friday June 20th
Jun 12 — main exam dates August 6-8th, with July 17th as back-up
Jun 12 — slides and recording of eight lecture available
May 26 — slides and recording of seventh lecture available , third assignment posted
May 22 — seventh lecture will be on Monday May 26th in E9.1 0.01
May 22 — slides and recording of sixth lecture available
May 15 — slides and recording of fifth lecture available
May 08 — second assignment posted
Apr 29 — slides and recording of fourth lecture available
Apr 24 — fourth lecture will be on Tuesday April 29th
Apr 24 — slides and recording of third lecture available
Apr 17 — Zoom and Youtube links work again.
Apr 17 — slides and recording of second lecture available
Apr 10 — slides and recording of first lecture available available , first assignment posted
Mar 26 — preliminary schedule online
Dec 02 — website online

Course Information

Type	Advanced Lecture (6 ECTS)
Lecturer	Prof. Dr. Jilles Vreeken
Email	vreeken (at) cispa.de
Lectures	Thursdays, 10–12 o'clock (sharp) in 0.05 (CISPA, E9.1) and online via Zoom and YouTube.
Registration	Not necessary, see below
Summary	In this advanced course we'll be investigating hot topics in machine learning and data mining that the lecturer thinks are cool. This course is for those of you who are interested in Machine Learning, Data Mining, Data Science – or, as the lecturer prefers to call it – Algorithmic Data Analysis. We'll be looking into what causality is and how we can extract it from data, how to discover significant and useful patterns, how to gain insight into complex neural models, as well as how to learn inherently interpretable models from complex data.

Preliminary Schedule

Month	Day	Topic	Slides	Assignment	Req. Reading	Opt. Reading
Apr	10	Introduction and Practicalities	PDF	1st assignment out
	17	Useful Patterns	PDF		[1]	[10,11,12]
	24	Insightful Patterns	PDF	deadline 1st	[2]	[13,14,15]
	29*	Actionable Patterns	PDF		[3]	[16,17]
May	8	Jilles travelling – no class		2nd assignment out
	15	Causal Models	PDF		[4] Ch 1, Ch 6	[18,19,20]
	22	Causal Discovery	PDF		[4] Ch 2, Ch 7	[21,22,23]
	26*	Causal Inference	PDF	deadline 2nd, 3rd out	[5]	[24,25,26]
Jun	5	Jilles busy – no class
	12	Above and Beyond	PDF		[6]	[27,28,?]
	20*	Sequences	PDF	deadline 3rd, 4th out	[7]	[29,30,31]
	26	Graph Epidemics	PDF		[8]	[32,33,34,35]
Jul	3	Graph Understanding	PDF		[9]	[36,37,38]
	10	Wrap-Up	PDF	deadline 4th
Aug	7	oral exams
Oct	6	oral re-exams

* Lecture on a different day

All report deadlines are on the indicated day at 10:00.

Registration

There is no need to register for the course with the lecturer. The credentials to the Zoom meetings, YouTube stream, and necessary materials, will be shared in the first (publicly available) lecture.

As is usual, you will have to register for the exam via LSF. You can do so up to one week before the exam.

Prerequisites

Students should have basic working knowledge of machine learning, data mining, and/or statistics, e.g. by successfully having taken courses such as Machine Learning, Probabilistic Graphical Models, Probabilistic Machine Learning, Elements of Machine Learning, etc.

The skills you will benefit most are critical thought and reading comprehension. We will practice these in the lectures and assignments.

Lectures

TADA will be taught hybrid. You are encouraged to attend the lectures in-person in the CISPA lecture hall (room 0.05 of E9.1), we will additionally stream the lectures to Zoom and YouTube. The Zoom meetings, YouTube streams, and edited videos will be linked in the schedule.

The credentials to access the course materials will be shared during the first lecture.

Assignments

Students will individually do one assignment per topic – four in total. For every assignment, you will have to read one or more research papers and hand in a report that critically discusses this material and answers the assignment questions. Reports should summarise the key aspects, but more importantly, should include original and critical thought that show you have acquired a meta level understanding of the topic – plain summaries will not suffice. All sources you've drawn from should be referenced. The expected length of a report is 4 pages, but there is no limit.

The deadlines for the reports are on the day indicated in the schedule at 10:00 Saarbrücken standard-time. You are free to hand in earlier.

You will find some well-graded example reports here.

Grading and Exam

The assignments will be graded in scale of Fail, Pass, Very Good, and Excellent. Any assignment not handed in by the deadline is automatically considered Failed. You are allowed to re-do one Failed assignment: you have to hand in the improved assignment within two weeks. If the improved assignment is not at least a Pass, you are no longer eligible to take the exam.

Every Excellent gives you one bonus point, as do every two Very Good grades. Each bonus point improves a passing exam grade by 1/3, up to a maximum improvement of a full mark. For example, if you have two bonus points and you receive 2.0 from the final exam, your final grade will be 1.3. If you fail the final exam, you fail the course, irrespective of bonus points. Provided you are eligible to sit the final exam, previously Failed assignments do not reduce your final grade.

The final exams will be oral, and will cover all the material discussed in the lectures and the topics on which you did your assignments. The dates for the two exams are as follows. The main exam will be on August 7th, 8th, and 9th, with a handful of slots available on July 17th. The re-exam will be on October 6th, with October 7th as a back-up.

The exact time slot per student will be announced per email. Inform the lecturer of any potential clashes as soon as you know them.

Materials

All required and optional reading will be made available here. You will need a username and password that will be given out in the first lecture.

In case you do not have a strong enough background in data mining, machine learning, or statistics, these books [4,39,40,41] may help to get you on your way. The university library kindly keeps hard copies of these books available in a so-called Semesteraparat.

Required Reading

[1]	van Leeuwen, M. & Vreeken, J. Mining and Using Sets of Patterns through Compression. In Frequent Pattern Mining, Aggarwal, C. & Han, J., pages 165-198, Springer, 2014.
[2]	Fischer, J., Oláh, A. & Vreeken, J. What's in the Box? Exploring the Inner Life of Neural Networks with Robust Rules. In Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2021.
[3]	Atzmueller, M. Subgroup Discovery. WIRE's Data Mining and Knowledge Discovery, 5:35-49, Wiley, 2015.
[4]	Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference. MIT Press, 2017.
[5]	Mian, O., Marx, A. & Vreeken, J. Discovering Fully Directed Causal Networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2021.
[6]	Mameche, S., Kaltenpoth, D. & Vreeken, J. Discovering Invariant and Changing Mechanisms from Data. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), ACM
[7]	Bhattacharyya, A. & Vreeken, J. Efficiently Summarising Event Sequences with Rich Interleaving Patterns. In Proceedings of the SIAM International Conference on Data Mining (SDM'17), SIAM, 2017.
[8]	Prakash, B.A., Vreeken, J. & Faloutsos, C. Spotting Culprits in Epidemics: How many and Which ones?. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), Brussels, Belgium, IEEE, 2012.
[9]	Coupette, C. & Vreeken, J. Graph Similarity Description. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2021.

Optional Reading

[10]	Vreeken, J., van Leeuwen, M. & Siebes, A. Krimp: Mining Itemsets that Compress. Data Mining and Knowledge Discovery, 23(1):169-214, Springer, 2011.
[11]	Smets, K. & Vreeken, J. Slim: Directly Mining Descriptive Patterns. In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, pages 236-247, Society for Industrial and Applied Mathematics (SIAM), 2012.
[12]	Budhathoki, K. & Vreeken, J. The Difference and the Norm -- Characterising Similarities and Differences between Databases. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Springer, 2015.
[13]	Fischer, J. & Vreeken, J. Sets of Robust Rules, and How to Find Them. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Springer, 2019.
[14]	Fischer, J. & Vreeken, J. Differentiable Pattern Set Mining. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2021.
[15]	Walter, N.P., Fischer, J. & Vreeken, J. Finding Interpretable Class-Specific Patterns through Efficient Neural Search. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2024.
[16]	Sutton, C., Boley, M., Ghiringhelli, L., Rupp, M., Vreeken, J. & Scheffler, M. Identifying Domains of Applicability of Machine Learning Models for Materials Science. Nature Communications, 11:1-9, Nature Research, 2020.
[17]	Xu, S., Walter, N.P., Kalofolias, J. & Vreeken, J. Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence. In Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2024.
[18]	Pearl, J. Causality. Cambridge University Press, 2009.
[19]	Pearl, J. & Mackenzie, D. The Book of Why. Basic Books, 2018.
[20]	Budhathoki, K., Boley, M. & Vreeken, J. Rule Discovery for Exploratory Causal Reasoning. In Proceedings of the SIAM Conference on Data Mining (SDM), SIAM, 2021.
[21]	Chickering, D.M. Optimal Structure Identification With Greedy Search. JMLR, 3:507-554, 2002.
[22]	Colombo, D. & Maathuis, M. Order-independent Constraint-based Causal Structure Learning. Journal of Machine Learning Research, 15(1):3741-3782, 2014.
[23]	Zheng, X., Aragam, B., Ravikumar, P. & Xing, E.P. DAGs with NO TEARS: Continuous Optimization for Structure Learning. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), PMLR, 2018.
[24]	Marx, A. & Vreeken, J. Identifiability of Cause and Effect using Regularized Regression. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), ACM, 2019.
[25]	Mian, O., Kamp, M. & Vreeken, J. Information-Theoretic Causal Discovery and Intervention Detection over Multiple Environments. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2023.
[26]	Xu, S., Mameche, S. & Vreeken, J. Information-Theoretic Causal Discovery in Topological Order. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2025.
[27]	Kaltenpoth, D. & Vreeken, J. Identifying Selection Bias from Observational Data. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2023.
[28]	Kaltenpoth, D. & Vreeken, J. Causal Discovery with Hidden Confounders. In Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2023.
[29]	Tatti, N. & Vreeken, J. The Long and the Short of It: Summarizing Event Sequences with Serial Episodes. In Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China, ACM, 2012.
[30]	Cueppers, J., Kalofolias, J. & Vreeken, J. Omen: Discovering Sequential Patterns with Reliable Prediction Delays. Knowledge and Information Systems, Springer, 2022.
[31]	Cueppers, J. & Jilles, V. Below the Surface: Summarizing Event Sequences with Generalized Sequential Patterns. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2023.
[32]	Lappas, T., Terzi, E., Gunopulos, D. & Mannila, H. Finding effectors in social networks. In Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Washington, DC, pages 1059-1068, ACM, 2010.
[33]	Shah, D. & Zaman, T. Rumors in a Network: Who's the Culprit?. IEEE Transactions on Information Technology, 57(8):5163-5181, 2011.
[34]	Sundareisan, S., Vreeken, J. & Prakash, B.A. Hidden Hazards: Finding Missing Nodes in Large Graph Epidemics. In Proceedings of the SIAM International Conference on Data Mining (SDM'15), SIAM, 2015.
[35]	Rozenshtein, P., Gionis, A., Prakash, B.A. & Vreeken, J. Reconstructing an Epidemic over Time. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16), pages 1835-1844, ACM, 2016.
[36]	Chakrabarti, D., Papadimitriou, S., Modha, D.S. & Faloutsos, C. Fully automatic cross-associations. In Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Seattle, WA, pages 79-88, 2004.
[37]	Koutra, D., Kang, U., Vreeken, J. & Faloutsos, C. VoG: Summarizing Graphs using Rich Vocabularies. In Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA, pages 91-99, SIAM, 2014.
[38]	Goeble, S., Tonch, A., Böhm, C. & Plant, C. MeGS: Partitioning Meaningful Subgraph Structures Using Minimum Description Length. In Proceedings of the IEEE International Conference on Data Mining (ICDM), pages 889-894, IEEE, 2016.
[39]	Wasserman, L. All of Statistics. Springer, 2005.
[40]	Aggarwal, C.C. Data Mining - The Textbook. Springer, 2015.
[41]	Hardt, M. & Recht, B. Patterns, Predictions, and Actions - A story about machine learning. Princeton University Press, 2022.