Robust Risk Minimization under Label Noise

Kumar, Himanshu

View/Open

Thesis full text (31.57Mb)

Author

Kumar, Himanshu

Metadata

Show full item record

Abstract

In the setting of supervised learning, one learns a classi fier from training data consisting of patterns and the corresponding labels. When labels of the examples in training data have errors, it is referred to as label noise. In practice, label noise is unavoidable. For example, when labelling of patterns is done by human experts, we may have label noise due to the unavoidable subjective biases and/or random human errors. Now-a-days, in many applications, large data sets are often labelled through crowd-sourcing which would also result in label noise both due to human errors as well as due to variations in the quality of the crowd-sourced labellers. Many studies have shown that label errors adversely affect the standard classifier learning algorithms such as Support Vector Machine(SVM), Logistic Regression, Neutral Networks etc. Thus, robustness of classifier learning algorithms to label noise is an important desired property. This thesis investigates the robustness of risk minimization algorithms to label noise. There are many approaches suggested in the literature for mitigating the adverse affects of label noise. One can use some heuristics to detect examples with noisy labels and remove them from training data. Using similar heuristics, modi fications are suggested in algorithms such as perceptron, Adaboost etc. for mitigating adverse effects of label noise. Another important approach is to treat the true labels as missing data and, using some probabilistic model of label corruption, estimate the posterior probability of the true labels using, e.g., EM algorithm. In this thesis, we study robustness of classi fier learning algorithms which can be formulated as risk minimization methods. In risk minimization framework, one learns a classifi er by minimizing the expectation of a loss function with respect to the underlying unknown distribution. Many of the standard classi fier learning algorithms (e.g., Naive Bayes, Backpropagation for learning feedforward neural networks, SVMs etc.) can be posed as risk minimization. One approach to robust risk minimization is called loss correction. Here, to minimize risk with loss L with respect to the true label distribution, one creates a new loss function L0 and minimizes risk with it under the corrupted labels. However, to nd the proper L0 for a given L, one needs knowledge of the label corruption probabilities (which may be estimated from the data). Another approach to robust risk minimization is to seek loss functions that result in inherent robustness of risk minimization. An advantage with this approach is that one need not differentiate between the noisy or noise free training data. The classi fier learning algorithm remains the same. This is the approach that is investigated in this thesis. The robustness of risk minimization depends on the loss function used. In this thesis we derive sufficient conditions on the loss function so that risk minimization under that loss function is robust to different types of label noise models. We call loss functions that satisfy these conditions as robust losses. Our main theoretical results address the robustness of risk minimization under symmetric and class-conditional label noise model. In symmetric label noise, probability of mislabelling a sample to other class is same irrespective of a pattern. Symmetric label noise model is suitable for applications where errors in the labels are random. In class conditional label noise, errors in labels are dependent on the underlying true class of a pattern. This model is suitable for applications where some pairs of classes are more likely to be confused than others. We also discuss our results on the most general noise model called non-uniform label noise where probability of labelling error depends on the pattern vector also. All our theoretical results are for the case of multi-class classifi cation and these results generalize some similar results known for the case of binary classi fication. All our theoretical results concern minimization of risk though in practice one can only minimize empirical risk. We provide one result on the consistency of empirical risk minimization under symmetric label noise. We also empirically demonstrate the utility of our theoretical results using neural network classi fiers. We consider three commonly used loss functions with deep neural networks, namely, Categorical Cross Entropy (CCE), Mean Square Error (MSE) and Mean Absolute Error (MAE). Out of these three, MAE loss satis fies the sufficient conditions of a robust loss while the other two do not. Through empirical investigation on synthetic and standard real data sets, we show the robustness of MAE loss compared to the others. While the MAE loss is robust, it is difficult to minimize empirical risk under this loss and this is seen from our empirical results. It takes a very large number of epochs and a good initialization point to optimize MAE loss compared to CCE and MSE, both of which are not robust. To alleviate this issue, we propose a novel robust loss called Robust Log Losses (RLL). This loss can be viewed as a modi fication of CCE to make it robust. Empirical risk minimization under RLL is similar to that under CCE in terms of learning rate. However, RLL satis es the sufficient condition for robustness and we show empirically that RLL is superior to CCE in terms of robustness to label noise. Learning with RLL is more efficient compared to that with MAE. We further extend our concept of robust risk minimization under label noise to multi-label categorization problems. In multi-label problems, a pattern may belong to more than one class unlike the case with multi-class problems where only one label is associated with a pattern. We fi rst de ne symmetric label noise model in the context of multi-label classifi cation problems which is a useful model for random errors in labelling. Next, we study robust learning of multi-label classfii ers under risk minimization and propose sufficient conditions for a loss to be robust under symmetric label noise. These su cient conditions are satis ed by the Hamming loss and its surrogate robust losses. In the case of multi-label problems also, we empirically demonstrate our theoretical results.

URI

https://etd.iisc.ac.in/handle/2005/4760

Collections

Electrical Engineering (EE) [361]