• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    On Policy Gradients, Momentum, and Learning with Adversaries: Algorithms and Convergence Analysis

    View/Open
    Thesis full text (5.425Mb)
    Author
    Ganesh, Swetha
    Metadata
    Show full item record
    Abstract
    This thesis comprises five works, organized into three parts: the first focuses on average-reward Reinforcement Learning (RL), the second on distributed learning under adversaries in heterogeneous and asynchronous setups, and the third on momentum in stochastic optimization. Reinforcement Learning (RL), a subfield of machine learning, trains agents to develop optimal strategies for interacting with their environments. Policy gradient methods have a long history in RL and remain an attractive class of algorithms due to their flexibility. They are applicable to any di!erentiable policy parameterization, support function approximation, and can easily incorporate structured state and action spaces. Additionally, these methods are straightforward to implement in a simulation-based, model-free manner. However, PG methods present several open challenges and this thesis addresses some of these challenges: 1. Variance-Reduced Policy Gradient for Average-Reward RL: We study policy gradient methods with general policy parameterization for average-reward infinite-horizon Markov Decision Processes (MDPs). Our results demonstrate that variance-reduced policy gradient techniques achieve an expected regret of ˜O(→T), a significant improvement over the prior state-ofthe- art bound of ˜O(T3/4). 2. Natural Actor-Critic with Multi-Level Monte Carlo: We propose a natural actor-critic algorithm with a linear critic and general parameterized policy that achieves the optimal global convergence rate of O(1/→T). The method applies to large or even infinite state–action spaces and, by leveraging Multi-Level Monte Carlo (MLMC) to reduce bias in Markovian estimates, removes the need to partition samples according to mixing or hitting times. 3. Federated Reinforcement Learning Under Adversaries: We establish the first global sample complexity result for federated reinforcement learning with general policy parameterization in the presence of adversaries, achieving optimal-order sample complexity ˜O ! 1 Nω2 ! 1 + f2 N "" for finding an ω-optimal policy, where N is the total number of agents and f <N/2 the number of adversarial agents. Distributed systems are vulnerable to challenges such as random failures and adversarial attacks, which may arise from hardware malfunctions, inaccurate training data, or intentional malicious actions. Adversarial attacks, in particular, where attackers may possess significant system knowledge or collaborate, can severely disrupt learning processes. There is limited analysis on such attacks in the heterogenous and asynchrnoous setting which is typical in practice, and this thesis explores the following problem: 4. Distributed Asynchronous and Heterogeneous Learning in Adversarial Settings: We study a distributed linear problem under adversarial settings, with applications to network tomography. Despite challenges posed by heterogeneity and asynchronous setups, we develop an algorithm that ensures almost sure convergence to the true solution. Stochastic optimization focuses on maximizing or minimizing functions using noisy gradient estimates and has broad applications across domains. Improving its sample complexity is a central challenge. Momentum-based methods, such as the Heavy Ball (HB) and Nesterov’s Accelerated Gradient (NAG), are known to accelerate exact gradient descent for quadratic objectives. However, their impact on convergence in stochastic settings remains unclear. This thesis investigates this question: 5. Momentum in Stochastic Optimization: We show that Stochastic Heavy Ball (SHB) and Nesterov’s Accelerated Stochastic Gradient (ASG) cannot outperform standard Stochastic Gradient Descent (SGD) in terms of sample complexity, even in simple stochastic settings. This is demonstrated by deriving a lower bound on the sample complexity of SHB and ASG and proving that this bound is achievable by vanilla SGD.
    URI
    https://etd.iisc.ac.in/handle/2005/8492
    Collections
    • Computer Science and Automation (CSA) [542]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV