Structured Regularization Through Convex Relaxations Of Discrete Penalties

Sankaran, Raman

dc.contributor.advisor	Bhattacharyya, Chiranjib
dc.contributor.advisor	Bach, Francis
dc.contributor.author	Sankaran, Raman
dc.date.accessioned	2021-10-21T07:15:05Z
dc.date.available	2021-10-21T07:15:05Z
dc.date.submitted	2018
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5456
dc.description.abstract	Motivation. Empirical risk minimization(ERM) is a popular framework for learning predictive models from data, which has been used in various domains such as computer vision, text processing, bioinformatics, neuro-biology, temporal point processes, to name a few. We consider the cases where one has apriori information regarding the model structure, the simplest one being the sparsity of the model. The desired sparsity structure can be imputed into ERM problems using a regularizer, which is typically a norm on the model vector. Popular choices of the regularizers include the `1 norm (LASSO) which encourages sparse solutions, block-`1 which encourages block level sparsity, among many others which induce more complicated structures. To impute the structural prior, recently, many studies have considered combinatorial functions on the model's support to be the regularizer. But this leads to an NP-hard problem, which thus motivates us to consider convex relaxations of the corresponding combinatorial functions, which are tractable. Existing work and research gaps. The convex relaxations of combinatorial functions have been studied recently in the context of structured sparsity, but they still lead to inefficient computational procedures in general cases: e.g., even when the combinatorial function is submodular, whose convex relaxations are well studied and easier to work with than the general ones, the resulting problem is computationally expensive (the proximal operator takes O(d6) time for d variables). Hence, the associated high expenses have limited the research interest towards these these regularizers, despite the submodular functions which generate them are expressive enough to encourage many of the structures such as those mentioned before. Hence it remains open to design efficient optimization procedures to work with the submodular penalties, and with combinatorial penalties in general. It is also desirable that the optimization algorithms designed to be applicable across the possible choices of the loss functions arising from various applications. We identify four such problems from these existing research gaps, and address them through the contributions which are listed below. We provide the list of publications related to this thesis following this abstract Contributions. First, we propose a novel kernel learning technique termed as Variable sparsity kernel learning (VSKL) for support vector machines (SVM), which are applicable when there is an apriori information regarding the grouping among the kernels. In such cases, we propose a novel mixed-norm regularizer, which encourages sparse selection of the kernels within a group while selecting all groups. This regularizer can also be viewed as the convex relaxation of a specifi c discrete penalty on the model's support. The resulting non-smooth optimization problem is difficult, where o -the-shelf techniques are not applicable. We propose a mirror descent based optimization algorithm to solve this problem, which has a guaranteed convergence rate of O(1= p T) over T iterations. We demonstrate the efficiency of the proposed algorithm in various scaling experiments, and applicability of the regularizer in an object recognition task. Second, we introduce a family of regularizers termed as Smoothed Ordered Weighted L1 (SOWL) norms, which are derived as the convex relaxation of non-decreasing cardinalitybased submodular penalties, which form an important special class of the general discrete penalties. Considering linear regression, where the presence of correlated predictors cause the traditional regularizers such as Lasso to fail recovering the true support, SOWL has the property of selecting the correlated predictors as groups. While Ordered Weighted `1 (OWL) norms address similar use cases, they are biased to promote undesirable piece-wise constant solutions, which SOWL does not have. SOWL is also shown equivalent to group-lasso, but without specifying the groups explicitly. We develop efficient proximal operators for SOWL, and illustrate its computational and theoretical benefi ts through various simulations. Third, we discuss Hawkes-OWL, an application of OWL regularizers for the setting of multidimensional Hawkes processes. Hawkes process is a multi-dimensional point process (collection of multiple event streams) with self and mutual in fluences between the event streams. While the popular `1 regularizer fails to recover the true models in the presence of strongly correlated event streams, OWL regularization address this issue and groups the correlated predictors. This is the first instance in the literature, where OWL norms, which predominantly have been studied with respect to simple loss functions such as the squared loss, are extended to the Hawkes process with similar theoretical and computational guarantees. In the fourth part, we discuss generic first-order algorithms for learning with Subquadraic norms, a special sub-family of convex relaxations of submodular penalties. We consider subquadratic norms regularization in a very general setting, covering all loss functions, and propose different reformulations of the original problem. The reformulations enable us to propose two different primal-dual algorithms, CP- and ADMM- , both of which having a guaranteed convergence rate of O(1=T ). This study thus provides the rst ever algorithms with e cient convergence rates for learning with subquadratic norms.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;G29285
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Empirical risk minimization	en_US
dc.subject	convex relaxations	en_US
dc.subject	Variable sparsity kernel learning	en_US
dc.subject	support vector machines	en_US
dc.subject	While Ordered Weighted `1	en_US
dc.subject	Smoothed Ordered Weighted L1	en_US
dc.subject	Hawkes-OWL	en_US
dc.subject	Variable sparsity kernel learning	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science	en_US
dc.title	Structured Regularization Through Convex Relaxations Of Discrete Penalties	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: G29285.pdf
Size:: 1.339Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [561]

Show simple item record