Abstract

This work addresses active learning for multi-class classification. Active learning algorithms optimize classifier training by subsequently selecting those instances for labeling by an expert, which improve the classifier's performance the most. In this work, we identify different influence factors that positively affect active learning. These factors are (1) an instance's impact, (2) its posterior, and (3) the reliability of this posterior. We contribute a new decision-theoretic approach, called multi-class probabilistic active learning (McPAL). Building on a probabilistic active learning framework, our approach is non-myopic, fast, and optimizes a performance measure (like accuracy) directly. Considering all influence factors, McPAL determines the expected gain in performance to compare the usefulness of instances. For this purpose, it calculates the density weighted expectation over the true posterior and over all possible labeling combinations in a closed-form solution. Thus, in contrast to other multi-class algorithms, it considers the posterior's reliability which improved the performance. In our experimental evaluation, we show the reasonability of the selected influence factors and the superiority of McPAL in comparison to various other multi-class active learning algorithms on six datasets.

Code

  perfGain calculation function (Python)

Complete Plots and Tables

  Download PDF