Probabilistic Active Learning (PAL) is a new active learning approach for classifiers. It follows a smoothness assumption and models for a candidate instance both the true posterior in its neighbourhood and its label as random variables. By computing for each candidate its expected gain in classification performance over both variables, PAL selects the candidate for labelling that is optimal in expectation. PAL shows comparable or better classification performance than error reduction and uncertainty sampling, has the same asymptotic linear time complexity as uncertainty sampling, and is faster than error reduction.

Advantages

PAL combines several advantages in a single active learning approach:

Versatility: PAL is usable with any classification technology (e.g. Decision Tree, Naive Bayes,...) and any classification point performance measure (e.g. error rate, accuracy, misclassification loss,...).
Optimizing a performance measure: PAL selects labelling candidates such that a user-specified performance measure is directly optimized, similar to expected error reduction approaches.
Efficiency: PAL requires an asymptotic runtime that is solely linear in the number of labelling candidates. This is the same as fast uncertainty sampling approaches.

Publications

Probabilistic Active Learning: A Short Proposition BibTeX PDF (author's manuscript): Georg Krempl, Daniel Kottke, Myra Spiliopoulou
Proceedings of the 21st European Conf. on Artificial Intelligence (ECAI2014), August 18 - 22, 2014, Prague, Czech Republic. Published by IOS Press, available at http://www.iospress.nl/book/ecai-2014/
Probabilistic Active Learning: Towards Combining Versatility, Optimality and Efficiency BibTeX PDF (author's manuscript): Georg Krempl, Daniel Kottke, Myra Spiliopoulou
Proceedings of the 17th Int. Conf. on Discovery Science (DS2014), October 8--10, 2014, Bled, Slovenia.
Published by Springer, the original publication is available at http://link.springer.com.
Optimised Probabilistic Active Learning (OPAL) BibTeX Supplemental Material: Georg Krempl, Daniel Kottke, Vincent Lemaire
Machine Learning, Volume 100, Issue 2-3, pp. 449-476.
Published by Springer, the original publication is available at http://link.springer.com.
Clustering-Based Optimised Probabilistic Active Learning (COPAL) BibTeX: Georg Krempl, Tuan Cuong Ha, Myra Spiliopoulou
Proceedings of the 18th Int. Conf. on Discovery Science (DS2015), October 4--6, 2015, Banff, Canada.
Published by Springer, the original publication is available at http://link.springer.com.
How to Select Information That Matters: A Comparative Study on Active Learning Strategies for Classification BibTeX PDF (author's manuscript): Christian Beyer, Georg Krempl, Vincent Lemaire
Proceedings of the 15th Int. Conf. on Knowledge Technologies and Data-Driven Business (i-KNOW 2015), October 21--22, 2015, Graz, Austria.
Published by ACM, the original publications is available at http://dl.acm.org.
Probabilistic Active Learning in Datastreams BibTeX Supplemental Material: Daniel Kottke, Georg Krempl, Myra Spiliopoulou
Proceedings of the 14th Int. Symp. on Intelligent Data Analysis (IDA 2015), October 22--24, 2015, Saint-Etienne, France.
Published by Springer, the original publication is available at http://link.springer.com.

Implementations

OPALgain for MATLAB/Octave
OPALgain for Python

People

PAL is developed by a team at the KMD lab, Otto-von-Guericke University (OvGU) Magdeburg, Germany.

Contributors (in chronological order) include:

Georg Krempl, Principal Investigator, KMD Lab, Germany
Myra Spiliopoulou, Head of KMD Lab, Germany
Daniel Kottke, KMD Lab, Germany
Vincent Lemaire, Orange Labs, France
Christian Beyer, OvGU, Magdeburg
Cuong Tuan Ha, OvGU, Magdeburg