Reconstructing an Epidemic over Time

Abstract. We consider the problem of reconstructing an epidemic over time, or, more general, reconstructing the propagation of an activity in a network. Our input consists of a temporal network, which contains information about when two nodes interacted, and a small sample of nodes that have been reported as infected. The goal is to recover the flow of the spread, including discovering the starting nodes, and identifying other likely-infected nodes that were not reported. This has multiple applications, from public health to social media and viral marketing purposes.

Previous work explicitly factor-in many unrealistic assumptions: (a) the underlying network does not change or that we see all interactions; (b) that we have access to perfect noise-free data; or (c) that we know the exact propagation model. In contrast, we avoid these simplifications, and take into account the temporal network, require only a small sample of reported infections, and do not make any restrictive assumptions on the propagation model.

We develop CulT, a scalable and effective algorithm to reconstruct epidemics that is also suited for an online setting. It works by formulating the problem as that of a temporal Steiner-tree computation, for which we design a fast algorithm leveraging the specific structure of our problem. We demonstrate the efficacy of CulT through extensive experiments on diverse datasets.

Implementation

the Python source code (February 2016) by Polina Rozenshtein.

Related Publications

	Rozenshtein, P, Gionis, A, Prakash, BA & Vreeken, J Reconstructing an Epidemic over Time. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp 1835-1844, ACM, 2016. (18.1% acceptance rate)
	Prakash, BA, Vreeken, J & Faloutsos, C Efficiently Spotting the Starting Points of an Epidemic in a Large Graph. Knowledge and Information Systems vol.38(1), pp 35-59, Springer, 2014. (IF 2.225)
	Prakash, BA, Vreeken, J & Faloutsos, C Spotting Culprits in Epidemics: How many and Which ones?. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp 11-20, IEEE, 2012. (full paper, 10.7% acceptance rate; overall 20%)