Browsing Division of Electrical, Electronics, and Computer Science (EECS) by Subject "Upper Confidence Bound"
Now showing items 1-1 of 1
-
Learning Tournament Solutions from Preference-based Multi-Armed Bandits
We consider the dueling bandits problem, a sequential decision task where the goal is to learn to pick `good' arms out of an available pool by actively querying for and observing relative preferences between selected pairs ...