dc.description.abstract | Numerous engineering fields, such as transportation systems, manufacturing, communication networks, healthcare, and finance, frequently encounter problems requiring optimization in the presence of uncertainty. Simulation-based optimization is a workable substitute for accurate analytical solutions because of the numerous input variables and the need for a system model. Smoothed functional (SF) algorithms
belong to the class of simultaneous perturbation methods that have been found useful for stochastic optimization problems, particularly in high-dimensional parameter spaces. SF methods update the gradient of the objective using function measurements involving parameters
that are perturbed simultaneously along all component directions. \cite{katkul} originally developed the SF gradient procedure. This results in the objective function
getting smoothed because of the convolution. The objective function smoothing
that results from the convolution with a smoothing density function can help the algorithm converge to a global minimum or a point close to it.
First, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over
noisy cost samples and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the $\delta$ sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtains the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an $\epsilon$-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that our algorithm outperforms GSF, SPSA, and RDSA by a significant margin over a few non-convex settings, and we further validate its performance over convex (noisy) objectives.
Next, we consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. We propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms towards finding an $\epsilon$-SOSP is $\mathcal{O}(\epsilon^{-3.5})$, and this is a significant improvement over the previous state-of-the-art sample complexity of $\mathcal{O}(\epsilon^{-4.5})$. | en_US |