Assignment 0: The Example


The deadline for this example assignment was long in the past. Students were free to hand in earlier. They had to choose one topic from the (truncated) list below, read the articles, and hand in a report that critically discussed this material and answers the assignment questions. Reports should have summarise the key aspects, but more importantly, should have included original and critical thought that showed the student acquired a meta-level understanding of the topic – plain summaries did not suffice. All sources used should have be appropriately referenced, any text quoted should have been clearly identified as such. The expected length of a report was between 3 to 5 pages, but there was no limit.

There were four topics for the assignment, but for exposition we only give the one for which we provide the sample report.

  1. Where did the candidates go? — Hard

    The standard approach to mine frequent itemsets is to

    1. generate a set of candidate itemsets,
    2. test which are frequent, and
    3. use those to generate new candidates,
    and iterate until done. Eclat [1], proposed in 1997, is an example of a simple yet very efficient algorithm for mining frequent itemsets that follows this principle in a depth-first search.

    The authors of [2] claim that their method can mine frequent itemsets without candidate generation. This raises the question: where did the candidates go? Discuss whether this claim is valid or not, and why.

    (Bonus) TreeProjection [3] was proposed before [2]. The authors of [2] almost aggressively discuss that FPGrowth is really different than TreeProjection. Is it really? Why (not)? Discuss, and if possible, give an example where they are (not) different. Hint: consider how they explore the search space.

Grading took into account both Hardness of questions, as well as whether the Bonus questions were answered.

The following example report is an actual report handed in by a student graded, and was graded as Excellent, as it showed creative and critical thought, was concise, and clearly written. It also (before redaction) included the name, email address and matriculation number of the student.



References

You will need a username and password to access the papers outside the MPI network. It will be given out during the first lecture.

[1] Zaki, M.J., Parthasarathy, S., Ogihara, M. & Li, W. New algorithms for fast discovery of association rules. In Proceedings of the 3rd ACM International Conference on Knowledge Discovery and Data Mining (KDD), Newport Beach, CA, 1997.
[2] Han, J., Pei, J. & Yin, Y. Mining frequent patterns without candidate generation. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), Dallas, TX, pages 1-12, ACM, 2000.
[3] Agarwal, R.C., Aggarwal, C.C. & Prasad, V. A tree projection algorithm for generation of frequent item sets. J. Parallel Distr. Com., 61(3):350-371, 2001.