Show simple item record

dc.contributor.advisorBhatnagar, Shalabh
dc.contributor.authorMondal, Akash
dc.date.accessioned2023-05-04T05:00:13Z
dc.date.available2023-05-04T05:00:13Z
dc.date.submitted2022
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6086
dc.description.abstractNumerous engineering fields, such as transportation systems, manufacturing, communication networks, healthcare, and finance, frequently encounter problems requiring optimization in the presence of uncertainty. Simulation-based optimization is a workable substitute for accurate analytical solutions because of the numerous input variables and the need for a system model. Smoothed functional (SF) algorithms belong to the class of simultaneous perturbation methods that have been found useful for stochastic optimization problems, particularly in high-dimensional parameter spaces. SF methods update the gradient of the objective using function measurements involving parameters that are perturbed simultaneously along all component directions. \cite{katkul} originally developed the SF gradient procedure. This results in the objective function getting smoothed because of the convolution. The objective function smoothing that results from the convolution with a smoothing density function can help the algorithm converge to a global minimum or a point close to it. First, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the $\delta$ sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtains the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an $\epsilon$-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that our algorithm outperforms GSF, SPSA, and RDSA by a significant margin over a few non-convex settings, and we further validate its performance over convex (noisy) objectives. Next, we consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. We propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms towards finding an $\epsilon$-SOSP is $\mathcal{O}(\epsilon^{-3.5})$, and this is a significant improvement over the previous state-of-the-art sample complexity of $\mathcal{O}(\epsilon^{-4.5})$.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00100
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectStochastic Approximationen_US
dc.subjectSmoothed functional algorithmsen_US
dc.subjectReinforcement Learningen_US
dc.subjectNewton algorithmsen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titleStochastic Optimization And Its Application In Reinforcement Learningen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record