Optimization Algorithms for Deterministic, Stochastic and Reinforcement Learning Settings

Joseph, Ajin George

dc.contributor.advisor	Bhatnagar, Shalabh
dc.contributor.author	Joseph, Ajin George
dc.date.accessioned	2018-05-30T14:57:47Z
dc.date.accessioned	2018-07-31T04:40:31Z
dc.date.available	2018-05-30T14:57:47Z
dc.date.available	2018-07-31T04:40:31Z
dc.date.issued	2018-05-30
dc.date.submitted	2017
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/3645
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/4515/G28470-Abs.pdf	en_US
dc.description.abstract	Optimization is a very important field with diverse applications in physical, social and biological sciences and in various areas of engineering. It appears widely in ma-chine learning, information retrieval, regression, estimation, operations research and a wide variety of computing domains. The subject is being deeply studied both theoretically and experimentally and several algorithms are available in the literature. These algorithms which can be executed (sequentially or concurrently) on a computing machine explore the space of input parameters to seek high quality solutions to the optimization problem with the search mostly guided by certain structural properties of the objective function. In certain situations, the setting might additionally demand for “absolute optimum” or solutions close to it, which makes the task even more challenging. In this thesis, we propose an optimization algorithm which is “gradient-free”, i.e., does not employ any knowledge of the gradient or higher order derivatives of the objective function, rather utilizes objective function values themselves to steer the search. The proposed algorithm is particularly effective in a black-box setting, where a closed-form expression of the objective function is unavailable and gradient or higher-order derivatives are hard to compute or estimate. Our algorithm is inspired by the well known cross entropy (CE) method. The CE method is a model based search method to solve continuous/discrete multi-extremal optimization problems, where the objective function has minimal structure. The proposed method seeks, in the statistical manifold of the parameters which identify the probability distribution/model defined over the input space to find the degenerate distribution concentrated on the global optima (assumed to be finite in quantity). In the early part of the thesis, we propose a novel stochastic approximation version of the CE method to the unconstrained optimization problem, where the objective function is real-valued and deterministic. The basis of the algorithm is a stochastic process of model parameters which is probabilistically dependent on the past history, where we reuse all the previous samples obtained in the process till the current instant based on discounted averaging. This approach can save the overall computational and storage cost. Our algorithm is incremental in nature and possesses attractive features such as stability, computational and storage efficiency and better accuracy. We further investigate, both theoretically and empirically, the asymptotic behaviour of the algorithm and find that the proposed algorithm exhibits global optimum convergence for a particular class of objective functions. Further, we extend the algorithm to solve the simulation/stochastic optimization problem. In stochastic optimization, the objective function possesses a stochastic characteristic, where the underlying probability distribution in most cases is hard to comprehend and quantify. This begets a more challenging optimization problem, where the ostentatious nature is primarily due to the hardness in computing the objective function values for various input parameters with absolute certainty. In this case, one can only hope to obtain noise corrupted objective function values for various input parameters. Settings of this kind can be found in scenarios where the objective function is evaluated using a continuously evolving dynamical system or through a simulation. We propose a multi-timescale stochastic approximation algorithm, where we integrate an additional timescale to accommodate the noisy measurements and decimate the eﬀects of the gratuitous noise asymptotically. We found that if the objective function and the noise involved in the measurements are well behaved and the timescales are compatible, then our algorithm can generate high quality solutions. In the later part of the thesis, we propose algorithms for reinforcement learning/Markov decision processes using the optimization techniques we developed in the early stage. MDP can be considered as a generalized framework for modelling planning under uncertainty. We provide a novel algorithm for the problem of prediction in reinforcement learning, i.e., estimating the value function of a given stationary policy of a model free MDP (with large state and action spaces) using the linear function approximation architecture. Here, the value function is defined as the long-run average of the discounted transition costs. The resource requirement of the proposed method in terms of computational and storage cost scales quadratically in the size of the feature set. The algorithm is an adaptation of the multi-timescale variant of the CE method proposed in the earlier part of the thesis for simulation optimization. We also provide both theoretical and empirical evidence to corroborate the credibility and effectiveness of the approach. In the final part of the thesis, we consider a modified version of the control problem in a model free MDP with large state and action spaces. The control problem most commonly addressed in the literature is to find an optimal policy which maximizes the value function, i.e., the long-run average of the discounted transition payoffs. The contemporary methods also presume access to a generative model/simulator of the MDP with the hidden premise that observations of the system behaviour in the form of sample trajectories can be obtained with ease from the model. In this thesis, we consider a modified version, where the cost function to be optimized is a real-valued performance function (possibly non-convex) of the value function. Additionally, one has to seek the optimal policy without presuming access to the generative model. In this thesis, we propose a stochastic approximation algorithm for this peculiar control problem. The only information, we presuppose, available to the algorithm is the sample trajectory generated using a priori chosen behaviour policy. The algorithm is data (sample trajectory) efficient, stable, robust as well as computationally and storage efficient. We provide a proof of convergence of our algorithm to a high performing policy relative to the behaviour policy.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G28470	en_US
dc.subject	Optimization Algorithms	en_US
dc.subject	Reinforcement Learning	en_US
dc.subject	Machine Learning	en_US
dc.subject	Markov Decision Process	en_US
dc.subject	Stochastic Approximation Algorithm	en_US
dc.subject	Stochastic Optimization	en_US
dc.subject	Cross Entropy Method	en_US
dc.subject	Stochastic Global Optimization	en_US
dc.subject	Cross Entropy Optimization Method	en_US
dc.subject	Quantile Estimation	en_US
dc.subject	Continuous Optimization	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Optimization Algorithms for Deterministic, Stochastic and Reinforcement Learning Settings	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G28470.pdf
Size:: 7.407Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [561]

Show simple item record

Optimization Algorithms for Deterministic, Stochastic and Reinforcement Learning Settings

Files in this item

This item appears in the following Collection(s)

Related items

Swarm Intelligence And Evolutionary Computation For Single And Multiobjective Optimization In Water Resource Systems ﻿

Automatic Optimization of Geometric Multigrid Methods using a DSL Approach ﻿

Multireservoir Systems Optimization : A New Approach ﻿

Swarm Intelligence And Evolutionary Computation For Single And Multiobjective Optimization In Water Resource Systems

Automatic Optimization of Geometric Multigrid Methods using a DSL Approach

Multireservoir Systems Optimization : A New Approach