Show simple item record

dc.contributor.advisorSaladi, Rahul
dc.contributor.authorRamakrishnan, K
dc.date.accessioned2023-03-17T04:35:02Z
dc.date.available2023-03-17T04:35:02Z
dc.date.submitted2022
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6043
dc.description.abstractMulti-Armed Bandits (MAB) is a popular framework for modelling sequential decision-making problems under uncertainty. This thesis is a compilation of two independent works on MABs. 1. In the first work, we study a multi-armed bandit (MAB) version of the range-searching problem. In its basic form, range searching considers as input a set of points (on the real line) and a collection of (real) intervals. Here, with each specified point, we have an associated weight, and the problem objective is to find a maximum-weight point within every given interval. The current work addresses range searching with stochastic weights: each point corresponds to an arm (that admits sample access), and the point’s weight is the (unknown) mean of the underlying distribution. In this MAB setup, we develop sample-efficient algorithms that find, with high probability, near-optimal arms within the given intervals, i.e., we obtain PAC (probably approximately correct) guarantees. We also provide an algorithm for a generalization wherein the weight of each point is a multi-dimensional vector. The sample complexities of our algorithms depend, in particular, on the size of the optimal hitting set of the given intervals. Finally, we establish lower bounds proving that the obtained sample complexities are essentially tight. Our results highlight the significance of geometric constructs— specifically, hitting sets—in our MAB setting. 2. In the second work, we consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms’ rewards are stochastic and independent over time, but that the absolute difference between the expected rewards of any arm at any two consecutive time-steps is at most a drift limit δ > 0. For this setting that has not received enough attention in the past, we give a new algorithm which extends naturally the well-known Successive Elimination algorithm to the non-stationary bandit setting. We establish the first instance-dependent regret upper bound for slowly varying non-stationary bandits. The analysis in turn relies on a novel characterization of the instance as a detectable gap profile that depends on the expected arm reward differences. We also provide the first minimax regret lower bound for this problem, enabling us to show that our algorithm is essentially minimax optimal. Also, this lower bound we obtain matches that of the more general total variation-budgeted bandits problem, establishing that the seemingly easier former problem is at least as hard as the more general latter problem in the minimax sense. We complement our theoretical results with experimental illustrations.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00058
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectMulti-Armed Banditsen_US
dc.subjectRange Searchingen_US
dc.subjectstochastic weightsen_US
dc.subject.classificationResearch Subject Categories::MATHEMATICS::Applied mathematics::Theoretical computer scienceen_US
dc.titleMulti-Armed Bandits – On Range Searching and On Slowly-varying Non-stationarityen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record