The Information Theory Seminar WS'16

News

more ▾

Mar 13 — program for Seminar day online.
Jan 17 — slides eleventh lecture available here.
Jan 10 — slides tenth lecture available here.
Dec 07 — seminar day will be on March 14th.
Dec 06 — slides ninth lecture available here.
Nov 30 — topic assignment announced.
Nov 24 — slides seventh lecture available here.
Nov 22 — slides sixth lecture available here.
Nov 17 — slides fifth lecture available here.
Nov 15 — slides fourth lecture available here.
Nov 03 — slides second lecture available here.
Oct 26 — slides first lecture available here.
Oct 24 — Updated schedule.
Oct 21 — Finally, lots more information.
Aug 10 — Website online, more details in September.

Course Information

Type	Seminar (7 ECTS)
Lecturer	Dr. Jilles Vreeken
Email	its-staff (at) mpi-inf.mpg.de
Meetings	Tuesdays and Thursdays, 10–12 o'clock in Room E1.7 323. (we will not use every time slot, detailed schedule will be filled out over time)
Max Capacity	12 students. See here how to apply.
Summary	In this seminar we'll be investigating the following questions: What is interesting and meaningful structure, and how can we identify this in data? What are good models for our data when we don't know about any decent priors, don't have clear expectations, or, don't even know what we're looking for? What is the ultimate model for our data and how can we approximate this model in practice? We'll be exploring these questions in light of Algorithmic Information Theory (AIT) and its practical variant, the Minimum Description Length (MDL) principle.

Schedule

Month	Day	Type	Topic	Slides	Req. Reading	Opt. Reading
Oct	25	L	Practicalities, Introduction	PDF	[1] Ch 1.1–1.10
Oct	27	–	yay – no lecture – Jilles travelling
Nov	1	–	yay – no lecture – holiday
Nov	3	L	Kolmogorov Complexity	PDF	[1] Ch 2.1–2.2, 2.8	[2,3]
Nov	8	D	Kolmogorov in Action		[4,5,6]
Nov	10	–	yay – no lecture – Jilles travelling
Nov	15	L	The Minimum Description Length principle	PDF	[7] Ch 1
Nov	17	L	Coding	PDF	[1] Ch 1.11 & [7] Ch 3
Nov	22	L	Practical Coding 1: Matrix Factorisation	PDF	[8,9]
Nov	24	L	Practical Coding 2: Sequences	PDF	[10]
Nov	29	D	Practical Coding 3: Graphs		[11]
Dec	1	–	yay – no lecture – Jilles travelling
Dec	6	L	MML — model selection using priors	PDF	[7] Ch 17.4, [12]
Dec	8	D	MDL and MML for Predictive Modeling		[13]
Dec	13	–	yay – no lecture – Jilles travelling
Dec	15	–	yay – no lecture – Jilles travelling
Dec	20	–	Winter break
Dec	22	–	Winter break
Dec	27	–	Winter break
Dec	29	–	Winter break
Jan	10	L	Structure Functions	PDF	[14], ([7] Ch 17.8)
Jan	12	D	Structure Functions		[15]
Jan	17	L	Information	PDF	[1] Ch 2.8 (1.11)
Mar	14	S	Student Presentations

Lecture type key:

L = lecture — sit, listen & ask questions, read homework material when you prefer
D = discuss — read homework material in advance, actively discuss the topics in the meeting
S = selected students give a presentation on the topic of their assignment, followed by a short discussion.

Student Presentations

Month	Day	When	Who	Topic	Reading
Mar	14	10:00	Frauke	MDL & Denoising	[16,17]
		10:30	Magnus	Switching and Coding	[18]
		11:00	Dominic	Less Loss, Better Slim?	[19,20]
		11:30	Maha	MDL & Classification	[7] Ch 15
		12:00		– pizza –
		13:00	Patrick	The Incompressibility Method	[1] Ch 6
		13:30	Boris	Solomonoff's Theory of Prediction	[1] Ch 5.2
		14:00	Yuliia	The Problem of Sophistication	[21] Ch 4
		14:30		– break –
		14:45	Tatiana	MDL & Linear Regression	[7] Ch 12
		15:15	Nithya	Power and Perils of MDL	[22,23]

Every student will have to give a presentation and write a report on an assigned topic. The presentation should be between 15 and 20 minutes long, and will be followed by a 10 minute discussion. The style of the presentation should be like a lecture; your fellow students can follow it with only the previous lectures of the course as background material, you should only discuss those details that are necessary, and make use of examples where possible. Topic will be assigned by the lecturer.

In addition to the presentation, students will have to hand in a report that discusses the assigned topic. Unlike the presentation, which has to be high-level, the report is where you are allowed to show off what you've learned. Summarize the material you read as clearly as you can in your own words, identifying the key contributions/most interesting or important aspects, relating the topic to any/all previous lectures in the course and/or papers read for the course, all the while correctly referring to the sources you've drawn from. The expected length of the report is 5-10 pages, but you are free to use as many pages as you like.

Students will be graded on the slides, the presentation, your answers in the discussion, and the report. Students are allowed to ask for feedback on their slides once a week, up till latest 1 week before the presentation. The provided reading material are provided as starting points. For some topics the provided materials will be (by far) (way) (more than) enough, for others you may want to gather more material. When in doubt, contact the lecturer.

The deadline for both the slides and report is right before the meeting (i.e. 10:00) in which you have to give your presentation.

Materials

All required reading will be made available here. You will need a username and password to access the papers outside the MPI network. Contact the lecturer if you don't know the username or password.

The university library kindly keeps hard copies of the two most important books for this course available in a so-called Semesteraparat.

[1]	Li, M. & Vitányi, P. An Introduction to Kolmogorov Complexity and its Applications. Springer, 1993.
[2]	Kolmogorov, A. On Tables of Random Numbers. The Indian Journal of Statistics, Series A, 25(4):369-376, 1963.
[3]	Kolmogorov, A. Three Approaches to the Quantitative Definition of Information. Problemy Peredachi Informatsii, 1(1):3-11, 1965.
[4]	Li, M., Chen, X., Li, X., Ma, B. & Vitanyi, P. The similarity metric. IEEE Transactions on Information Technology, 50(12):3250-3264, 2004.
[5]	Cilibrasi, R. & Vitányi, P. Clustering by Compression. IEEE Transactions on Information Technology, 51(4):1523-1545, 2005.
[6]	Faloutsos, C. & Megalooikonomou, V. On data mining, compression and Kolmogorov Complexity. Data Mining and Knowledge Discovery, 15(1):3-20, Springer-Verlag, 2007.
[7]	Grünwald, P. The Minimum Description Length Principle. MIT Press, 2007.
[8]	Miettinen, P. & Vreeken, J. mdl4bmf: Minimum Description Length for Boolean Matrix Factorization. ACM Transactions on Knowledge Discovery from Data, 2014.
[9]	Lucchese, C., Orlando, S. & Perego, R. Mining Top-K Patterns from Binary Datasets in presence of Noise. In Proceedings of the 10th SIAM International Conference on Data Mining (SDM), Columbus, OH, pages 165-176, 2010.
[10]	Tatti, N. & Vreeken, J. The Long and the Short of It: Summarizing Event Sequences with Serial Episodes. In Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China, ACM, 2012.
[11]	Koutra, D., Kang, ., Vreeken, J. & Faloutsos, C. VoG: Summarizing Graphs using Rich Vocabularies. In Proceedings of the SIAM International Conference on Data Mining (SDM), Philadelphia, PA, SIAM, 2014.
[12]	Wallace, C.S. & Freeman, P.R. Estimation and Inference by Compact Coding. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 49(3):240-265, Wiley, 1987.
[13]	Grünwald, P., Kontkanen, P., Myllymäki, P., Silander, T. & Tirri, H. Minimum Encoding Approaches for Predictive Modeling. In Proceedings of the 14th International Conference on Uncertainty in Artificial Intelligence (UAI), 1998.
[14]	Vereshchagin, N. & Vitányi, P. Kolmogorov's Structure functions and Model Selection. IEEE Transactions on Information Technology, 50(12):3265-3290, 2004.
[15]	Siebes, A. & Kersten, R. A Structure Function for Transaction Data. In Proceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ, pages 558-569, SIAM, 2011.
[16]	Rissanen, J. MDL Denoising. IEEE Transactions on Information Technology, 46(7):2537-2543, IEEE, 2000.
[17]	Roos, T., Myllymäki, P. & Tirri, H. On the Behaviour of MDL Denoising. In Proceedings of AISTATS, 2005.
[18]	Erven, T.v., Grünwald, P. & De Rooij, S. Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC--BIC dilemma. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(3):361-417, Blackwell Publishing Ltd, 2012.
[19]	Vreeken, J., van Leeuwen, M. & Siebes, A. Krimp: Mining Itemsets that Compress. Data Mining and Knowledge Discovery, 23(1):169-214, Springer, 2011.
[20]	Smets, K. & Vreeken, J. Slim: Directly Mining Descriptive Patterns. In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, pages 236-247, Society for Industrial and Applied Mathematics (SIAM), 2012.
[21]	Bloem, P. Single Sample Statistics. PhD Thesis, 2016.
[22]	Adriaans, P. & Vitányi, P. The Power and Perils of MDL. Proceedings of ISIT 2007, 2007.
[23]	Adriaans, P. & Vitányi, P. Approximation of the Two-Part MDL Code. IEEE Transactions on Information Theory, 55(1):444-457, 2009.

Structure and Content

In general terms, the course will consist of

introductory lectures,
reading, discussing and presenting scientific articles, and
a 'practical' assignment (optional)

At a high level, the topics we will cover will include

Algorithmic Information Theory

Induction by Compression
Optimal descriptions: Kolmogorov Complexity
Optimal prediction: Solomonoff's Universal Probability Distribution
Optimal generalization: Kolmogorov's Structure Functions

Information Theoretic Modeling in Practice

The Minimum Description Length Principle
Two-Part MDL and Refined MDL
Prefix codes, Universal Codes, Prequential coding, and NML
How to define a good encoding — optimality vs. practicality
Other model selection techniques — ML vs. MDL vs. BIC vs. AIC vs. MML

Loosely speaking, students will learn about the information theoretical optimal approach to learn from data, as well as learn about and how to apply the more practical approach of MDL for a variety of model selection problems.

Course format

The course has up to two meetings per week. The first part of the course will feature regular lectures covering the basic topics of the course and sessions in which we will discuss material covered in the lectures and scientific articles assigned by the lecturer. During the second part the students will write an essay based on scientific articles assigned to them by the lecturer and will prepare a presentation. All student presentations will be scheduled in a block-seminar style, that is in one or two days near the end of the semester, the exact date is to be announced.

There will be no weekly tutorial group meetings.

Prerequisites

Students should have basic working knowledge of data analysis and statistics, e.g. by successfully having taken courses related to data mining, machine learning, and/or statistics, such as Topics in Algorithmic Data Analysis, Machine Learning, Probabilistic Graphical Models, Statistical Learning, Information Retrieval and Data Mining, etc.

Signing Up

The seminar is fully booked. It — or a course much like it — will likely be offered again in Winter Semester 2017.