PAKDD2013 Tutorial Summary

November 13, 2012Posted by KMD

We will present a Tutorial on Mining Multiple Threads of Streaming Data at PAKDD 2013, April 14-17, Gold Coast, Australia (exact time and place to be announced).


Stream mining is a mature area of research. However, several applications that require adaptive learning from evolving data do not seem to fit to the conventional stream mining paradigm. For example, a bank grants loans to customers and uses their data for model learning; the label (loan-payed-back YES or NO) arrives some years later, though, during which years the market may have changed drastically. Is this a stream mining problem? How many streams are there? We can distinguish between the stream of customers and the stream of their labels, which arrive with a time lag of years.

As another example, a hospital monitors patients with chronical diseases that come (ir)regularly to the hospital and undergo different tests; the streams of medical recordings and of signals (EEG, fMRI) can be used for learning. The hospital wants to learn a model on how the patients' health evolves in response to the disease and to medications. This problem seems completely different from the previous one, albeit streams of data are there in both cases.

In this tutorial, we bring together research advances on model learning and adaption for dynamic applications that collect and analyze different sources of dynamic data. In the introductory part of the tutorial, we present the classic stream mining paradigm and summarize the challenges being investigated in the state-of-the-art research.

In Part 1 of the tutorial, we formulate the problem of (supervised and unsupervised) learning from multiple sources and distinguish between two scenaria. In scenario A, a model is learned on a set of predefined entities (e.g. customers, patients), whereupon streams of data from multiple sources are exploited for learning and adaption. In scenario B, a model is learned on non-predefined events that occur in streams (e.g. bursty topic in news, emerging cyclone) and multiple streams are combined to get better and faster insights on the events. We discuss advances on both scenaria.

In Part 2 of the tutorial, we frame the problem of verification latency in stream mining, by distinguishing between the stream of unlabeled records and the stream of labels, and discuss advances on learning and adaption when exploiting both streams under concept drift and shift. These advances include research on transfer learning and on active learning.


The target groups are: postgraduate students with solid background in data mining; research scholars who work on conventional stream mining; scholars and practitioners that perform model learning on complex entities and need solutions that deal with their dynamics.


For attending the tutorial, please register at the PAKDD2013 conference web site.