Stochastic Optimization And Its Application In Reinforcement Learning

Mondal, Akash

dc.contributor.advisor	Bhatnagar, Shalabh
dc.contributor.author	Mondal, Akash
dc.date.accessioned	2023-05-04T05:00:13Z
dc.date.available	2023-05-04T05:00:13Z
dc.date.submitted	2022
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/6086
dc.description.abstract	Numerous engineering fields, such as transportation systems, manufacturing, communication networks, healthcare, and finance, frequently encounter problems requiring optimization in the presence of uncertainty. Simulation-based optimization is a workable substitute for accurate analytical solutions because of the numerous input variables and the need for a system model. Smoothed functional (SF) algorithms belong to the class of simultaneous perturbation methods that have been found useful for stochastic optimization problems, particularly in high-dimensional parameter spaces. SF methods update the gradient of the objective using function measurements involving parameters that are perturbed simultaneously along all component directions. \cite{katkul} originally developed the SF gradient procedure. This results in the objective function getting smoothed because of the convolution. The objective function smoothing that results from the convolution with a smoothing density function can help the algorithm converge to a global minimum or a point close to it. First, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the $\delta$ sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtains the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an $\epsilon$-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that our algorithm outperforms GSF, SPSA, and RDSA by a significant margin over a few non-convex settings, and we further validate its performance over convex (noisy) objectives. Next, we consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. We propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms towards finding an $\epsilon$-SOSP is $\mathcal{O}(\epsilon^{-3.5})$, and this is a significant improvement over the previous state-of-the-art sample complexity of $\mathcal{O}(\epsilon^{-4.5})$.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;ET00100
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Stochastic Approximation	en_US
dc.subject	Smoothed functional algorithms	en_US
dc.subject	Reinforcement Learning	en_US
dc.subject	Newton algorithms	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science	en_US
dc.title	Stochastic Optimization And Its Application In Reinforcement Learning	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: M_Tech_research_thesis_akash.pdf
Size:: 514.3Kb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [545]

Show simple item record