Algorithms for Online Learning in Structured Environments

Zaki, Mohammadi

dc.contributor.advisor	Gopalan, Aditya
dc.contributor.author	Zaki, Mohammadi
dc.date.accessioned	2023-05-02T05:01:50Z
dc.date.available	2023-05-02T05:01:50Z
dc.date.submitted	2022
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/6080
dc.description.abstract	Online learning deals with the study of making decisions sequentially using information gathered along the way. Typical goals of an online learning agent can be to maximize the reward gained during learning or to identify the best possible action to take with the maximum (expected) reward. We study this problem in the setting where the environment has some inbuilt structure. This structure can be exploited by the learning agent while making decisions to accelerate the process of learning from data. We study a number of such problems in this dissertation. We begin with regret minimization for multi-user online recommendation where the expected user-item reward matrix has low rank (much smaller than the number of users or items). We address the cold-start problem in recommendation systems where the agent initially has no information about the reward matrix and only gathers information gradually by interactions (i.e., recommending items) with arriving users. We use results from low-rank matrix estimation to design an efficient online algorithm to exploit the low rank of the underlying reward matrix. We analyze this algorithm and show that it enjoys better regret than algorithms that do not take the low-rank structure into account. We then study the problem of pure exploration (or best arm identification) in linear bandits. In this problem, each time the learner chooses an action vector (arm), she receives a noisy realization of a reward whose mean is linearly dependent on an unknown vector and the chosen action. The aim here is to identify the arm which yields the maximum reward (in expectation) as quickly as possible. We are specifically interested in the situation where the ambient dimension of the unknown parameter vector is very small as compared to the number of arms. We show that by using this inherent problem structure, one can design provably optimal and efficient algorithms to identify the best arm quickly. We study how exploiting the intrinsic geometry of the problem leads to the design of statistically and computationally efficient algorithms for the best arm identification problem. We finally formulate and study the problem of improper reinforcement learning, where for a given (unknown) Markov Decision Process (MDP), we are given a bag of (pre-designed) controllers or policies for the MDP. We study how the agent can leverage (combine) these pre-trained base controllers (instead of just the rudimentary actions of the MDP) to accelerate learning. This can be useful in tuning across ensembles of controllers, learning in mismatched or simulated environments, etc., to obtain a good controller for a given target environment with relatively few trials. This differs from the usual reinforcement learning setup where the learner observes the current state of the environment and chooses an action to play. In contrast, the improper learner chooses a given base controller and plays whichever action is recommended by the chosen controller. This indirect selection of actions via the base controllers helps to inherit desirable properties (e.g., interpretability, principled design, safety, etc) into the learned policy.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;ET00094
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Online Learning	en_US
dc.subject	linear bandits	en_US
dc.subject	improper reinforcement learning	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Telecommunication	en_US
dc.title	Algorithms for Online Learning in Structured Environments	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Thesis_zaki (7).pdf
Size:: 2.125Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electrical Communication Engineering (ECE) [380]

Show simple item record