Browsing Computer Science and Automation (CSA) by Subject "Multi-armed Bandits"
Now showing items 1-1 of 1
-
Learning Tournament Solutions from Preference-based Multi-Armed Bandits
We consider the dueling bandits problem, a sequential decision task where the goal is to learn to pick `good' arms out of an available pool by actively querying for and observing relative preferences between selected pairs ...