• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Communication Engineering (ECE)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Communication Engineering (ECE)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Exploration and Misspecification in Reinforcement Learning

    View/Open
    Thesis full text (3.518Mb)
    Author
    Banerjee, Debangshu
    Metadata
    Show full item record
    Abstract
    Among the basic challenges that confront reinforcement learning are exploration – the need to search effectively over large and complex state-action spaces – and misspecification, which arises from using function approximation to mitigate the curse of dimensionality inherent in these large state-action spaces. In this thesis, we study three central problems, each motivated by observations pertaining to these aspects of reinforcement learning. First, we examine exploration in linear bandits whose actions lie on smooth, curved manifolds. We prove that any algorithm achieving sublinear regret must inherently perform sufficient exploration. This phenomenon stands in stark contrast to what is observed and theoretically justified in standard multi-armed bandits. Next, we undertake a deeper investigation into model misspecification. We characterize a class of problems as robust, and show that despite arbitrary model error, these problems can be efficiently learned using standard, vanilla algorithms. These results extend existing literature, which has primarily focused on worst-case analyses. Finally, we study the effect of noise in reward models trained on preference datasets. We identify that alignment procedures for large language models (LLMs), when based on such noisy reward estimates, can suffer from performance degradation. To address this, we propose variance-aware policy updates, which we prove are theoretically less susceptible to degradation and support with empirical evidence. Together, these studies illustrate distinct aspects of exploration and misspecification arising either from model approximation errors or observation noise, and provide a theoretical foundation to explain the causation behind phenomena already observed in prior work, along with mitigation strategies wherever applicable.
    URI
    https://etd.iisc.ac.in/handle/2005/7380
    Collections
    • Electrical Communication Engineering (ECE) [456]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV