• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Barrier Function Inspired Reward Shaping in Reinforcement Learning

    View/Open
    Thesis full text (20.64Mb)
    Author
    Ranjan, Abhishek
    Metadata
    Show full item record
    Abstract
    Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps, which in the real world limits the practicality of these algorithms as this can lead to potentially dangerous behaviour. Hence, safe exploration is a critical issue when applying RL algorithms in the real world. Although RL excels in solving these challenging problems, the time required for convergence during training remains a significant limitation. Various techniques have been proposed to mitigate this issue, and reward shaping has emerged as a popular solution. However, most existing reward-shaping methods rely on value functions, which can pose scalability challenges as the environment’s complexity grows. Our research proposes a novel framework for reward shaping inspired by Barrier Functions, which is safety-oriented, intuitive, and easy to implement for any environment or task. To evaluate the effectiveness of our proposed reward formulations, we present our results on a challenging Safe Reinforcement Learning benchmark - the Open AI Safety Gym. We have conducted experiments on various environments, including CartPole, Half-Cheetah, Ant, and Humanoid. Our results demonstrate that our method leads to 1.4-2.8 times faster convergence and as low as 50-60% actuation effort compared to the vanilla reward. Moreover, our formulation has a theoretical basis for safety, which is crucial for real-world applications. In a sim-to-real experiment with the Go1 robot, we demonstrated better control and dynamics of the bot with our reward framework.
    URI
    https://etd.iisc.ac.in/handle/2005/6558
    Collections
    • Computer Science and Automation (CSA) [392]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV