The Elements of Statistical Learning WS'19


News

more ▾

Course Information

Type Advanced Lecture (5 ECTS)
Lecturers Prof. Dr. Jilles Vreeken and Prof. Dr. Tobias Marschall
Teaching Assistants Jonas Fischer (lead), Hufsah Ashraf (deputy), and Osman Ali Mian
Tutors Daniel Radke and Nancy Mekountchou Menoudjeu
Email esl-ta (at) mpi-inf.mpg.de
Lectures Thursdays, 10–12 o'clock in Campus E2.1 (CBI), Room 0.01
Tutorials Mondays, 12–14 o'clock in E1.4 (MPII) Room 0.21, and in E2.1 (CBI) Room 0.01
Wednesdays, 12–14 o'clock in E1.4 (MPII) Room 0.21, and in E2.1 (CBI) Room 0.07
Office Hours Jilles Vreeken and Tobias Marschall: after each lecture
Jonas Fischer, Hufsah Ashraf, Osman Ali Mian: by appointment
Summary

In this course we will convey the ability, given a data set, to choose an appropriate statistical method for analyzing it, to select the appropriate parameters for the statistical model generated by that method and to assess the quality of the resulting model. Both theoretical and practical aspects will be covered. What we cover will be relevant for computer scientists in general as well as for other scientists involved in data analysis and modeling.

Prerequisites

The course is targeted to advanced students in computer science, bioinformatics, math, and general science with a mathematical background. Students should know linear algebra and have good basic knowledge of statistics.

Schedule

Month Day Topic Slides Assignment Req. Reading Opt. Reading
Oct 17 Introduction and Basics PDF 1st assignment out ESL 1, 2, ISLR 1, 2
24 Linear Regression I PDF ESL 3, ISLR 3
31 Linear Regression II PDF deadline 1st, 2nd out ESL 3, ISLR 3
Nov 7 Classification I PDF ESL 4, ISLR 4
14 Classification 2 PDF deadline 2nd, 3rd out ESL 4, ISLR 4
21 Resampling Methods PDF ESL 7, ISLR 5
28 Model Selection and Regularization PDF deadline 3rd, 4th out ESL 3, ISLR 6
Dec 5 Dimensionality Reduction PDF ESL 3, ISLR 6
12 Beyond Linear PDF deadline 4th, 5th out ESL 5, 9, ISLR 7
19 Trees and Forests PDF ESL 9, ISLR 8
26 yay holiday – no class
Jan 2 yay holiday – no class
9 Support Vector Machines PDF deadline 5th, 6th out ESL 12, ISLR 9
16 Neural Networks PDF ESL 11
23 Advanced Visualization PDF deadline 6th, 7th out ESL 14, ISLR 10 [3]
30 lecturer sick – no class
Feb 6 Clustering plus Wrap-up Q&A session PDF deadline 7th ESL 14, ISLR 10
12 registration deadline exam
19-21 oral exams
March 11-12 re-exams

Materials

The course will, by and large, follow the book "An Introduction to Statistical Learning with Applications in R" [1]. At times the course will take additional material from the book "The Elements of Statistical Learning" [2]. The former book is the more introductory text, the latter book is more advanced. Both books are available for as free PDFs. We strongly encourage you, though, to acquire at least the first book in print. Further background literature is available in the library in the so-called Semesteraparat.

[1] James, W., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning with Applications in R. Springer, 2013.
[2] Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Springer, 2009.

For selected lectures we will identify interesting optional reading, such as relevant recent research papers. These we will make available here.

[3] van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579-2605, 2008.

Assignments

Each problem set will cover theoretical proofs and programming exercises with roughly equal weight. In general, the deadlines are on the day indicated in the schedule at 10:00 Saarbrücken standard-time. You are free to hand in earlier. Further details will be announced in the first lecture.

As programming language we will use R – a language for statistical computing. It is freely available for Windows, Linux and Mac. As a vectorized programming language, it is ideally suited for the problems we will encounter. There are also many freely available packages (or libraries) to perform a variety of classification and regression tasks, or to visualize the results of statistical analyses in a convenient way.

You hand in your solution as follows. For the theoretical exercises, you may hand in your solutions in handwritten form before the lecture, or send one PDF file with all the answers by email to esl-ta (at) mpi-inf.mpg.de. For the programming exercises, send a single email with both your R code as .R file (should compile with the command "Rscript YourCode.R") as well as a pdf answering the questions and showing the generated plots (if any).

No. Handout Date due Discussed on Assignment Sheet Additional Material
1 17 Oct 2019 31 Oct 2019 5 and 7 Nov 2019 Assignment 1 data
2 31 Oct 2019 14 Nov 2019 19 and 21 Nov 2019 Assignment 2
3 14 Nov 2019 28 Nov 2019 2 and 4 Dec 2019 Assignment 3 data
4 28 Nov 2019 12 Dec 2019 16 and 18 Dec 2019 Assignment 4 data
5 12 Dec 2019 9 Jan 2019 13 and 15 Jan 2019 Assignment 5 data
6 9 Jan 2020 23 Jan 2020 27 and 29 Jan 2020 Assignment 6 data
7 23 Jan 2020 6 Feb 2020 10 and 12 Feb 2020 Assignment 7 data

Tutorials

There will be one tutorial per week. In the week after you submitted an assignment, the solution will be present in the tutorial sessions on Monday and Wednesday 12:00, repectively. We will also help you with the current problem set. In the following week, we will return the corrected sheets to you on Monday or Wednesday, respectively. We will also recapitulate the lectures, and have some time for discussions.

No. Date Slides
0 28/30 Oct 2019 Tutorial 0, R-code, and Math foundations
1 12/14 Nov 2019
2 25/27 Nov 2019
3 09/11 Dec 2019
4 13/15 Jan 2019
5 28/30 Jan 2019
6 11/13 Feb 2019

R resources

R (version 3.2.3) is installed on the CIP pool computers and can be started by invoking R from the command line.

The official web site of the R project is r-project.org. You can download R for Windows, Linux and Mac from there. Additional packages, documentation and tutorials are also available for download from the official web site. Useful manuals and tutorials include:

The CRAN Contributed Documentation lists many other tutorials for R beginners and advanced programmers.

You can also check out RStudio, an open-source IDE for R.

Grading and Exam

You need a cumulative 50% of the points in the problem sets (in both theoretical and programming exercises) to be admitted to the exam.

To succesfully participate, you need to register for the exam in the LSF/HISPOS system of Saarland University – this will be possible as soon as the exam date has been entered into the system (this usually happens a few weeks into the semester).

The final exams will most likely be oral. The final decision on this will be made three weeks into the course. The final exam will cover all the material discussed in the lectures and the required reading. The main exam will be on February 19th, 20th, and 21st. The re-exam will be on March 11th, and 12th. The exact time slot per student will be announced per email. Inform the lecturers of any potential clashes as soon as you know them.

Acknowledgements

This course was originally developed by Thomas Lengauer, and we thank him for kindly providing his lecture materials and experience.