Browsing Division of Electrical, Electronics, and Computer Science (EECS) by Subject "Upper Confidence Bound"

Now showing items 1-1 of 1

Learning Tournament Solutions from Preference-based Multi-Armed Bandits

Siddartha, Y R

We consider the dueling bandits problem, a sequential decision task where the goal is to learn to pick `good' arms out of an available pool by actively querying for and observing relative preferences between selected pairs ...