PAKDD2013 Tutorial on Mining Multiple Threads of Streaming Data

PAKDD2013 Tutorial Outline

November 13, 2012Posted by KMD

The tutorial has three components.

Part 0: Introduction to Stream Mining

In this part, we give a brief outline on stream mining. We discuss examples of stream mining applications and elaborate on the most important challenges studied in the last years, foremostly the challenge of learning and adaption under drift. In this part, we point out to examples that seem to fall out of the typical stream mining paradigm, yet call also for stream mining methods. For example, 'learning a model over a stream of customer transactions' and 'learning a model of the customers using the stream of their transactions' are two distinct stream learning problems.

Sections:

What is stream mining?
Main tasks and challenges in stream mining
Most important research advances

Part 1: Learning on Evolving Entities from Multiple Streams

In this part, we discuss the problem of learning on evolving entities and distinguish two cases. In the first case, which we call 'scenario 1', the entities are predefined. For example, we want to learn and adapt a model of hospital patients or a model of customers, given the streams of records on them. For learning under this scenario, it is essential to keep in mind that the entities themselves constitute a stream - new entities arrive at any time. This scenario is studied mostly in the context of relational stream mining. We discuss research advances on relational stream classification, regression and clustering.

In the second case, which we call 'scenario 2', the entities are not predefined. Rather, the emphasis is on detecting events. For example, we want to learn and monitor bursty topics in news, whereupon we consider news' streams from different countries and/or in different languages. Another prominent example under this scenario is the detection of tropical storms from multiple streams (including streams of images and streams of sensor signals). This scenario is studied mostly in the context of learning from multiple sources. We also mention some research advances on transfer learning.

Sections:

How to learn a model on dynamic entities?
Scenario 1: Learning and Adaption of a Model on predefined Entities
Scenario 2: Learning Dynamic Events

Part 2: Learning from the Stream of Unlabeled Data and the Stream of Labels

The original stream classification paradigm contains many assumptions about the availability of feedback information, namely that the labels are immediately available, reliable and complete (ALL labels arrive). It is also assumed that the acquisition of a label incurs no additional cost. Learning and adapting a model of bank customers or hospital patients (scenario 1 of Part 1) violates all these assumptions. We explain this in a series of examples, which lead us to a reformulation of the stream classification problem into a multi-stream learning problem. We distinguish among the stream of the unlabeled data records (the stream of features), the stream of labels and the stream of label requests, and we discuss the interplay between concept drift and speed differences among these streams. We discuss advances on learning under verification latency, i.e. when the stream of labels is much slower than the stream of features. We then discuss advances on active learning, i.e. acquiring some labels actively, thereby taking account of labeling costs.

Sections:

Problem specification and framework
Advances on model learning and adaption under verification latency
Active learning in concept-drifting streams

NEXT: Presenters Downloads

BACK: Summary