Assignment 3: Interaction


The deadline is July 4th at 10:00 Saarbrücken standard-time. You are free to hand in earlier. You will have to choose one topic from the list below, read the articles, and hand in a report that critically discusses this material and answers the assignment questions. Reports should summarise the key aspects, but more importantly, should include original and critical thought that show you have acquired a meta level understanding of the topic – plain summaries will not suffice. All sources you use should be appropriately referenced, any text you quote should be clearly identified as such. The expected length of a report is between 3 to 5 pages, but there is no limit.

For the topic of your assignment, choose one of the following:

  1. Ulterior

    Discovering non-trivial interactions or correlations requires a powerful score. As we have seen, entropy has many likeable properties, especially that it can detect any type of interaction. Using entropy to define a good score for multivariate continuous-valued random variables is a bit tricky though. Nguyen et al. [1] showed with cmi that we can succesfully use cumulative entropy to this end. Soon after, they improved over cmi with mac [2] – loosely speaking, a multivariate extension of MIC, that uses cumulative entropy for maximal correlation analysis. Recently, they proposed UdS [3], which is universal. Each of these measures are shown to improve over each other, but one has to remain critical; your task is to carefully read these three papers, connect the dots on how they relate, identifying their respective strengths and weaknesses, identify which corners they cut, and reason about an even more ideal score that you could imagine building upon the main ideas of these papers.

  2. Independence (Hard)

    For a long time, practical causal inference depended on Reichenbach's principle of common cause, which allows us to identify causality between X, Y, and Z up to Markov equivalence classes via conditional independence tests. Modern approaches focus on identifying the most likely causal direction between pairs of variables X and Y. One approach, based on the notion of the Additive Noise Models (ANMs), is while rather restricted in theory, relatively easy to instantiate and quite succesful in practice. Another approach, based on the algorithmic Markov condition, is very powerful in theory, but much more difficult to instantiate directly.

    Read both [4] and [5] and besides paraphrasing the main ideas, such as why and how can we use ANMs resp. Kolmogorov Complexity to identify causal directions, most importantly, investigate how the two are related. Can the one be seen as an instance of the other? What are the implications? How can we instantiate a score based on the algorithmic Markov condition? Can we plug in any compressor, or does it have special requirements? How would you go about to instantiate it? Would Krimp [?], or Pack [6] do, and if so how? and so on.

  3. Fairly Causal

    Whenever we apply data-driven methods we have to be very careful that we do not introduce any algorithmic bias; we have to prevent our algorithms from making unfair or discriminative decisions, regardless of whether this is because of how what assumptions the method makes, or whether these biases might have been present in the data we trained the method on. Statistical parity, which assumes that all is fair if we make sure that the distribution of the sensitive attribute is the same between the groups we decide over, is a popular notion in fair machine learning, but has been attacked by Dwork et al. [7]. Bonchi et al. [8] do not mention parity at all in their paper, as they take a causality based approach. Read this paper carefully and critically. Do you find their approach convincing? Does it live up to the critiques that Dwork gives against statistical parity? Do you see potential other problems in this approach? How could it possibly go wrong? What are the main strengths, in your opinion, and what are the main weaknesses?

    (Bonus) In many applications it is not so much about making a (binary) decision, but rather about ranking elements. Say, a search engine that we can query to discover who are the smartest students on campus—and of course we want to be fair with regard to hair colour. How would you solve the fair-ranking problem, if your only tool is statistical parity? And, how would you solve the same problem if you would have the SBCN?

  4. Hype, hyper, hyperbolae

    A lot of research attention in graph mining is focused on the discovery of communities, groups of nodes that strongly interact. Many papers assume highly simplistic models, such as that communities strongly interact within, but only loosely to the outside, or, that communities do not overlap. Recently, different groups of researchers started considering that, just like graphs themselves, communities within these graphs may also show powerlaw-like edge distributions.

    Read [9], [10], and critically discuss these approaches. Example questions you can address include the following. What are their (hidden or not) assumptions, what are the strengths, and what are the weaknesses? Consider also the evaluation, what are these methods trying to reconstruct? Is that a meaningful goal, or is it a self-fulfilling prophecy? Does any of these papers convince you that real communities follow the assumptions the authors make? The first paper considers overlapping communities, whereas the other does not. That sounds like an important downside. Is it? Really? Why (not)? (Bonus, also consider [11])

Return the assignment by email to tada-staff@mpi-inf.mpg.de by 4 July, 1000 hours. The subject of the email must start with [TADA]. The assignment must be returned as a PDF and it must contain your name, matriculation number, and e-mail address together with the exact topic of the assignment.

Grading will take into account both Hardness of questions, as well as whether you answer the Bonus questions.



References

You will need a username and password to access the papers outside the MPI network. Contact the lecturer if you don't know the username or password.

[1] Nguyen, H.V., Müller, E., Vreeken, J., Keller, F. & Böhm, K. CMI: An Information-Theoretic Contrast Measure for Enhancing Subspace Cluster and Outlier Detection. In Proceedings of the 13th SIAM International Conference on Data Mining (SDM), Austin, TX, pages 198-206, SIAM, 2013.
[2] Nguyen, H.V., Müller, E., Vreeken, J. & Böhm, K. Multivariate Maximal Correlation Analysis. In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, pages 775-783, JMLR, 2014.
[3] Nguyen, H.V., Mandros, P. & Vreeken, J. Universal Dependency Analysis. In Proceedings of the 16th SIAM International Conference on Data Mining (SDM), Miami, FL, pages 791-800, SIAM, 2016.
[4] Peters, J., Janzing, D. & Schölkopf, B. Identifying Cause and Effect on Discrete Data using Additive Noise Models. , pages 597-604, 2010.
[5] Janzing, D. & Schölkopf, B. Causal Inference Using the Algorithmic Markov Condition. IEEE Transactions on Information Technology, 56(10):5168-5194, 2010.
[6] Tatti, N. & Vreeken, J. Finding Good Itemsets by Packing Data. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), Pisa, Italy, pages 588-597, 2008.
[7] Dwork, C., Hardt, M., Pitassi, T., Reingold, O. & Zemel, R. Fairness through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS), ACM, 2012.
[8] Bonchi, F., Hajian, S., Mushra, B. & Ramazotti, D. Exposing the probabilistic causal structure of discrimination. Data Science and Analytics, Springer, 2017.
[9] Yang, J. & Leskovec, J. Overlapping Communities Explain Core-Periphery Organization of Networks. Proceedings of the IEEE, 102(12), IEEE, 2014.
[10] Araujo, M., Günnemann, S., Mateos, G. & Faloutsos, C. Beyond Blocks: Hyperbolic Community Detection. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data (ECMLPKDD'14), Springer, 2014.
[11] Metzler, S., Günnemann, S. & Miettinen, P. Hyperbolae are no Hyperbole: Modelling Communities that are not Cliques. In Proceedings of the IEEE International Conference on Data Mining (ICDM'16), IEEE, 2016.