Achieving Fairness in the Stochastic Multi-Armed Bandit Problem

Patil, Vishakha

dc.contributor.advisor	Narahari, Y
dc.contributor.author	Patil, Vishakha
dc.date.accessioned	2021-05-20T07:10:06Z
dc.date.available	2021-05-20T07:10:06Z
dc.date.submitted	2019
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5129
dc.description.abstract	The classical Stochastic Multi-armed Bandit (MAB) problem provides an abstraction for many real-world decision making problems such as sponsored-search auctions, crowd-sourcing, wireless communication, etc. In this work, we study FAIR-MAB, a variant of the MAB problem, where, in addition to the goal of maximizing the sum of expected rewards, the algorithm also has to ensure that each arm is pulled for at least a given fraction of the total number of rounds which imposes an additional fairness constraint on the algorithm. The non-trivial aspect of FAIR-MAB arises when the time horizon T is unknown to the algorithm. The literature on fairness in the MAB setting centers around procedural fairness where the fairness guarantee is in terms of the decision-making process followed by the algorithm rather than in terms of the outcome of the decisions made by the algorithm. In contrast to this, we consider an outcome-based fairness notion which can be validated based on the decision of the algorithm. Our primary contribution is characterizing a class of algorithms for the FAIR-MAB problem by two parameters: the unfairness tolerance and the learning algorithm used as a black-box. We define an appropriate notion of fairness and show that our algorithm guarantees fairness independent of the choice of the learning algorithm. We define the notion of fairness-aware regret which naturally extends the conventional notion of regret, and show that the fairness-aware regret of our algorithm matches in order, the regret of the black-box learning algorithm in the absence of fairness constraints. Finally, we show via detailed simulations that our algorithm outperforms the best known algorithm for the FAIR-MAB problem in terms of the fairness guarantee that it provides, while performing comparably in terms of the fairness-aware regret. We also evaluate the cost of fairness in the MAB setting in terms of the conventional notion of regret	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;G29880
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Stochastic Multi-armed Bandit	en_US
dc.subject	FAIR-MAB problem	en_US
dc.subject	black-box learning algorithm	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology	en_US
dc.title	Achieving Fairness in the Stochastic Multi-Armed Bandit Problem	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G29880.pdf
Size:: 757.8Kb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [394]

Show simple item record