Design and Analysis of Consistent Algorithms for Multiclass Learning Problems

Harish, Guruprasad Ramaswami

dc.contributor.advisor	Agarwal, Shivani
dc.contributor.author	Harish, Guruprasad Ramaswami
dc.date.accessioned	2018-08-14T14:01:21Z
dc.date.accessioned	2018-08-28T09:15:02Z
dc.date.available	2018-08-14T14:01:21Z
dc.date.available	2018-08-28T09:15:02Z
dc.date.issued	2018-08-14
dc.date.submitted	2015
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/3971
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/4858/G27613-Abs.pdf	en_US
dc.description.abstract	We consider the broad framework of supervised learning, where one gets examples of objects together with some labels (such as tissue samples labeled as cancerous or non-cancerous, or images of handwritten digits labeled with the correct digit in 0-9), and the goal is to learn a prediction model which given a new object, makes an accurate prediction. The notion of accuracy depends on the learning problem under study and is measured by a performance measure of interest. A supervised learning algorithm is said to be 'statistically consistent' if it returns an `optimal' prediction model with respect to the desired performance measure in the limit of infinite data. Statistical consistency is a fundamental notion in supervised machine learning, and therefore the design of consistent algorithms for various learning problems is an important question. While this has been well studied for simple binary classification problems and some other specific learning problems, the question of consistent algorithms for general multiclass learning problems remains open. We investigate several aspects of this question as detailed below. First, we develop an understanding of consistency for multiclass performance measures defined by a general loss matrix, for which convex surrogate risk minimization algorithms are widely used. Consistency of such algorithms hinges on the notion of 'calibration' of the surrogate loss with respect to target loss matrix; we start by developing a general understanding of this notion, and give both necessary conditions and sufficient conditions for a surrogate loss to be calibrated with respect to a target loss matrix. We then define a fundamental quantity associated with any loss matrix, which we term the `convex calibration dimension' of the loss matrix; this gives one measure of the intrinsic difficulty of designing convex calibrated surrogates for a given loss matrix. We derive lower bounds on the convex calibration dimension which leads to several new results on non-existence of convex calibrated surrogates for various losses. For example, our results improve on recent results on the non-existence of low dimensional convex calibrated surrogates for various subset ranking losses like the pairwise disagreement (PD) and mean average precision (MAP) losses. We also upper bound the convex calibration dimension of a loss matrix by its rank, by constructing an explicit, generic, least squares type convex calibrated surrogate, such that the dimension of the surrogate is at most the (linear algebraic) rank of the loss matrix. This yields low-dimensional convex calibrated surrogates - and therefore consistent learning algorithms - for a variety of structured prediction problems for which the associated loss is of low rank, including for example the precision @ k and expected rank utility (ERU) losses used in subset ranking problems. For settings where achieving exact consistency is computationally difficult, as is the case with the PD and MAP losses in subset ranking, we also show how to extend these surrogates to give algorithms satisfying weaker notions of consistency, including both consistency over restricted sets of probability distributions, and an approximate form of consistency over the full probability space. Second, we consider the practically important problem of hierarchical classification, where the labels to be predicted are organized in a tree hierarchy. We design a new family of convex calibrated surrogate losses for the associated tree-distance loss; these surrogates are better than the generic least squares surrogate in terms of easier optimization and representation of the solution, and some surrogates in the family also operate on a significantly lower dimensional space than the rank of the tree-distance loss matrix. These surrogates, which we term the `cascade' family of surrogates, rely crucially on a new understanding we develop for the problem of multiclass classification with an abstain option, for which we construct new convex calibrated surrogates that are of independent interest by themselves. The resulting hierarchical classification algorithms outperform the current state-of-the-art in terms of both accuracy and running time. Finally, we go beyond loss-based multiclass performance measures, and consider multiclass learning problems with more complex performance measures that are nonlinear functions of the confusion matrix and that cannot be expressed using loss matrices; these include for example the multiclass G-mean measure used in class imbalance settings and the micro F1 measure used often in information retrieval applications. We take an optimization viewpoint for such settings, and give a Frank-Wolfe type algorithm that is provably consistent for any complex performance measure that is a convex function of the entries of the confusion matrix (this includes the G-mean, but not the micro F1). The resulting algorithms outperform the state-of-the-art SVMPerf algorithm in terms of both accuracy and running time. In conclusion, in this thesis, we have developed a deep understanding and fundamental results in the theory of supervised multiclass learning. These insights have allowed us to develop computationally efficient and statistically consistent algorithms for a variety of multiclass learning problems of practical interest, in many cases significantly outperforming the state-of-the-art algorithms for these problems.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G27613	en_US
dc.subject	Multiclass Learning Problems	en_US
dc.subject	Machine Learning	en_US
dc.subject	Multiclass Evaluation Metrics	en_US
dc.subject	Experience Mapping based Prediction Controller (EMPC)	en_US
dc.subject	Convex Calibration Dimension	en_US
dc.subject	Hierarchical Classification	en_US
dc.subject	Learning Algorithms	en_US
dc.subject	Consistent Multiclass Algorithms	en_US
dc.subject	Classification Calibration Dimension	en_US
dc.subject	General Multiclass Losses	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Design and Analysis of Consistent Algorithms for Multiclass Learning Problems	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G27613.pdf
Size:: 1.291Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [542]

Show simple item record

Design and Analysis of Consistent Algorithms for Multiclass Learning Problems

Files in this item

This item appears in the following Collection(s)

Related items

Sparse Multiclass And Multi-Label Classifier Design For Faster Inference ﻿

Learning with Complex Performance Measures : Theory, Algorithms and Applications ﻿

Multi-label Classification with Multiple Label Correlation Orders And Structures ﻿

Sparse Multiclass And Multi-Label Classifier Design For Faster Inference

Learning with Complex Performance Measures : Theory, Algorithms and Applications

Multi-label Classification with Multiple Label Correlation Orders And Structures