Exploration and Misspecification in Reinforcement Learning

Banerjee, Debangshu

View/Open

Thesis full text (3.518Mb)

Author

Banerjee, Debangshu

Metadata

Show full item record

Abstract

Among the basic challenges that confront reinforcement learning are exploration – the need to search effectively over large and complex state-action spaces – and misspecification, which arises from using function approximation to mitigate the curse of dimensionality inherent in these large state-action spaces. In this thesis, we study three central problems, each motivated by observations pertaining to these aspects of reinforcement learning. First, we examine exploration in linear bandits whose actions lie on smooth, curved manifolds. We prove that any algorithm achieving sublinear regret must inherently perform sufficient exploration. This phenomenon stands in stark contrast to what is observed and theoretically justified in standard multi-armed bandits. Next, we undertake a deeper investigation into model misspecification. We characterize a class of problems as robust, and show that despite arbitrary model error, these problems can be efficiently learned using standard, vanilla algorithms. These results extend existing literature, which has primarily focused on worst-case analyses. Finally, we study the effect of noise in reward models trained on preference datasets. We identify that alignment procedures for large language models (LLMs), when based on such noisy reward estimates, can suffer from performance degradation. To address this, we propose variance-aware policy updates, which we prove are theoretically less susceptible to degradation and support with empirical evidence. Together, these studies illustrate distinct aspects of exploration and misspecification arising either from model approximation errors or observation noise, and provide a theoretical foundation to explain the causation behind phenomena already observed in prior work, along with mitigation strategies wherever applicable.

URI

https://etd.iisc.ac.in/handle/2005/7380

Collections

Electrical Communication Engineering (ECE) [474]