Multi-Armed Bandits – On Range Searching and On Slowly-varying Non-stationarity

Ramakrishnan, K

dc.contributor.advisor	Saladi, Rahul
dc.contributor.author	Ramakrishnan, K
dc.date.accessioned	2023-03-17T04:35:02Z
dc.date.available	2023-03-17T04:35:02Z
dc.date.submitted	2022
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/6043
dc.description.abstract	Multi-Armed Bandits (MAB) is a popular framework for modelling sequential decision-making problems under uncertainty. This thesis is a compilation of two independent works on MABs. 1. In the first work, we study a multi-armed bandit (MAB) version of the range-searching problem. In its basic form, range searching considers as input a set of points (on the real line) and a collection of (real) intervals. Here, with each specified point, we have an associated weight, and the problem objective is to find a maximum-weight point within every given interval. The current work addresses range searching with stochastic weights: each point corresponds to an arm (that admits sample access), and the point’s weight is the (unknown) mean of the underlying distribution. In this MAB setup, we develop sample-efficient algorithms that find, with high probability, near-optimal arms within the given intervals, i.e., we obtain PAC (probably approximately correct) guarantees. We also provide an algorithm for a generalization wherein the weight of each point is a multi-dimensional vector. The sample complexities of our algorithms depend, in particular, on the size of the optimal hitting set of the given intervals. Finally, we establish lower bounds proving that the obtained sample complexities are essentially tight. Our results highlight the significance of geometric constructs— specifically, hitting sets—in our MAB setting. 2. In the second work, we consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms’ rewards are stochastic and independent over time, but that the absolute difference between the expected rewards of any arm at any two consecutive time-steps is at most a drift limit δ > 0. For this setting that has not received enough attention in the past, we give a new algorithm which extends naturally the well-known Successive Elimination algorithm to the non-stationary bandit setting. We establish the first instance-dependent regret upper bound for slowly varying non-stationary bandits. The analysis in turn relies on a novel characterization of the instance as a detectable gap profile that depends on the expected arm reward differences. We also provide the first minimax regret lower bound for this problem, enabling us to show that our algorithm is essentially minimax optimal. Also, this lower bound we obtain matches that of the more general total variation-budgeted bandits problem, establishing that the seemingly easier former problem is at least as hard as the more general latter problem in the minimax sense. We complement our theoretical results with experimental illustrations.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;ET00058
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Multi-Armed Bandits	en_US
dc.subject	Range Searching	en_US
dc.subject	stochastic weights	en_US
dc.subject.classification	Research Subject Categories::MATHEMATICS::Applied mathematics::Theoretical computer science	en_US
dc.title	Multi-Armed Bandits – On Range Searching and On Slowly-varying Non-stationarity	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: thesis-revised-aug25.pdf
Size:: 1.367Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [394]

Show simple item record