Show simple item record

dc.contributor.advisorGopalan, Aditya
dc.contributor.authorZaki, Mohammadi
dc.date.accessioned2023-05-02T05:01:50Z
dc.date.available2023-05-02T05:01:50Z
dc.date.submitted2022
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6080
dc.description.abstractOnline learning deals with the study of making decisions sequentially using information gathered along the way. Typical goals of an online learning agent can be to maximize the reward gained during learning or to identify the best possible action to take with the maximum (expected) reward. We study this problem in the setting where the environment has some inbuilt structure. This structure can be exploited by the learning agent while making decisions to accelerate the process of learning from data. We study a number of such problems in this dissertation. We begin with regret minimization for multi-user online recommendation where the expected user-item reward matrix has low rank (much smaller than the number of users or items). We address the cold-start problem in recommendation systems where the agent initially has no information about the reward matrix and only gathers information gradually by interactions (i.e., recommending items) with arriving users. We use results from low-rank matrix estimation to design an efficient online algorithm to exploit the low rank of the underlying reward matrix. We analyze this algorithm and show that it enjoys better regret than algorithms that do not take the low-rank structure into account. We then study the problem of pure exploration (or best arm identification) in linear bandits. In this problem, each time the learner chooses an action vector (arm), she receives a noisy realization of a reward whose mean is linearly dependent on an unknown vector and the chosen action. The aim here is to identify the arm which yields the maximum reward (in expectation) as quickly as possible. We are specifically interested in the situation where the ambient dimension of the unknown parameter vector is very small as compared to the number of arms. We show that by using this inherent problem structure, one can design provably optimal and efficient algorithms to identify the best arm quickly. We study how exploiting the intrinsic geometry of the problem leads to the design of statistically and computationally efficient algorithms for the best arm identification problem. We finally formulate and study the problem of improper reinforcement learning, where for a given (unknown) Markov Decision Process (MDP), we are given a bag of (pre-designed) controllers or policies for the MDP. We study how the agent can leverage (combine) these pre-trained base controllers (instead of just the rudimentary actions of the MDP) to accelerate learning. This can be useful in tuning across ensembles of controllers, learning in mismatched or simulated environments, etc., to obtain a good controller for a given target environment with relatively few trials. This differs from the usual reinforcement learning setup where the learner observes the current state of the environment and chooses an action to play. In contrast, the improper learner chooses a given base controller and plays whichever action is recommended by the chosen controller. This indirect selection of actions via the base controllers helps to inherit desirable properties (e.g., interpretability, principled design, safety, etc) into the learned policy.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00094
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectOnline Learningen_US
dc.subjectlinear banditsen_US
dc.subjectimproper reinforcement learningen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Telecommunicationen_US
dc.titleAlgorithms for Online Learning in Structured Environmentsen_US
dc.typeThesisen_US
dc.degree.namePhDen_US
dc.degree.levelDoctoralen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record