On Policy Gradients, Momentum, and Learning with Adversaries: Algorithms and Convergence Analysis

Ganesh, Swetha

View/Open

Thesis full text (5.425Mb)

Author

Ganesh, Swetha

Metadata

Show full item record

Abstract

This thesis comprises five works, organized into three parts: the first focuses on average-reward Reinforcement Learning (RL), the second on distributed learning under adversaries in heterogeneous and asynchronous setups, and the third on momentum in stochastic optimization. Reinforcement Learning (RL), a subfield of machine learning, trains agents to develop optimal strategies for interacting with their environments. Policy gradient methods have a long history in RL and remain an attractive class of algorithms due to their flexibility. They are applicable to any di!erentiable policy parameterization, support function approximation, and can easily incorporate structured state and action spaces. Additionally, these methods are straightforward to implement in a simulation-based, model-free manner. However, PG methods present several open challenges and this thesis addresses some of these challenges: 1. Variance-Reduced Policy Gradient for Average-Reward RL: We study policy gradient methods with general policy parameterization for average-reward infinite-horizon Markov Decision Processes (MDPs). Our results demonstrate that variance-reduced policy gradient techniques achieve an expected regret of ˜O(→T), a significant improvement over the prior state-ofthe- art bound of ˜O(T3/4). 2. Natural Actor-Critic with Multi-Level Monte Carlo: We propose a natural actor-critic algorithm with a linear critic and general parameterized policy that achieves the optimal global convergence rate of O(1/→T). The method applies to large or even infinite state–action spaces and, by leveraging Multi-Level Monte Carlo (MLMC) to reduce bias in Markovian estimates, removes the need to partition samples according to mixing or hitting times. 3. Federated Reinforcement Learning Under Adversaries: We establish the first global sample complexity result for federated reinforcement learning with general policy parameterization in the presence of adversaries, achieving optimal-order sample complexity ˜O ! 1 Nω2 ! 1 + f2 N "" for finding an ω-optimal policy, where N is the total number of agents and f <N/2 the number of adversarial agents. Distributed systems are vulnerable to challenges such as random failures and adversarial attacks, which may arise from hardware malfunctions, inaccurate training data, or intentional malicious actions. Adversarial attacks, in particular, where attackers may possess significant system knowledge or collaborate, can severely disrupt learning processes. There is limited analysis on such attacks in the heterogenous and asynchrnoous setting which is typical in practice, and this thesis explores the following problem: 4. Distributed Asynchronous and Heterogeneous Learning in Adversarial Settings: We study a distributed linear problem under adversarial settings, with applications to network tomography. Despite challenges posed by heterogeneity and asynchronous setups, we develop an algorithm that ensures almost sure convergence to the true solution. Stochastic optimization focuses on maximizing or minimizing functions using noisy gradient estimates and has broad applications across domains. Improving its sample complexity is a central challenge. Momentum-based methods, such as the Heavy Ball (HB) and Nesterov’s Accelerated Gradient (NAG), are known to accelerate exact gradient descent for quadratic objectives. However, their impact on convergence in stochastic settings remains unclear. This thesis investigates this question: 5. Momentum in Stochastic Optimization: We show that Stochastic Heavy Ball (SHB) and Nesterov’s Accelerated Stochastic Gradient (ASG) cannot outperform standard Stochastic Gradient Descent (SGD) in terms of sample complexity, even in simple stochastic settings. This is demonstrated by deriving a lower bound on the sample complexity of SHB and ASG and proving that this bound is achievable by vanilla SGD.

URI

https://etd.iisc.ac.in/handle/2005/8492

Collections

Computer Science and Automation (CSA) [547]