Online Learning and Simulation Based Algorithms for Stochastic Optimization

Lakshmanan, K

dc.contributor.advisor	Bhatnagar, Shalabh
dc.contributor.author	Lakshmanan, K
dc.date.accessioned	2018-03-07T17:51:08Z
dc.date.accessioned	2018-07-31T04:38:57Z
dc.date.available	2018-03-07T17:51:08Z
dc.date.available	2018-07-31T04:38:57Z
dc.date.issued	2018-03-07
dc.date.submitted	2012
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/3245
dc.identifier.abstract	http://etd.iisc.ac.in/static/etd/abstracts/4106/G25592-Abs.pdf	en_US
dc.description.abstract	In many optimization problems, the relationship between the objective and parameters is not known. The objective function itself may be stochastic such as a long-run average over some random cost samples. In such cases finding the gradient of the objective is not possible. It is in this setting that stochastic approximation algorithms are used. These algorithms use some estimates of the gradient and are stochastic in nature. Amongst gradient estimation techniques, Simultaneous Perturbation Stochastic Approximation (SPSA) and Smoothed Functional(SF) scheme are widely used. In this thesis we have proposed a novel multi-time scale quasi-Newton based smoothed functional (QN-SF) algorithm for unconstrained as well as constrained optimization. The algorithm uses the smoothed functional scheme for estimating the gradient and the quasi-Newton method to solve the optimization problem. The algorithm is shown to converge with probability one. We have also provided here experimental results on the problem of optimal routing in a multi-stage network of queues. Policies like Join the Shortest Queue or Least Work Left assume knowledge of the queue length values that can change rapidly or hard to estimate. If the only information available is the expected end-to-end delay as with our case, such policies cannot be used. The QN-SF based probabilistic routing algorithm uses only the total end-to-end delay for tuning the probabilities. We observe from the experiments that the QN-SF algorithm has better performance than the gradient and Jacobi versions of Newton based smoothed functional algorithms. Next we consider constrained routing in a similar queueing network. We extend the QN-SF algorithm to this case. We study the convergence behavior of the algorithm and observe that the constraints are satisfied at the point of convergence. We provide experimental results for the constrained routing setup as well. Next we study reinforcement learning algorithms which are useful for solving Markov Decision Process(MDP) when the precise information on transition probabilities is not known. When the state, and action sets are very large, it is not possible to store all the state-action tuples. In such cases, function approximators like neural networks have been used. The popular Q-learning algorithm is known to diverge when used with linear function approximation due to the ’off-policy’ problem. Hence developing stable learning algorithms when used with function approximation is an important problem. We present in this thesis a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. The Q-value parameters for a given policy in our algorithm are updated on the slower timescale while the policy parameters themselves are updated on the faster scale. We perform a gradient search in the space of policy parameters. Since the objective function and hence the gradient are not analytically known, we employ the efficient one-simulation simultaneous perturbation stochastic approximation(SPSA) gradient estimates that employ Hadamard matrix based deterministic perturbations. Our algorithm has the advantage that, unlike Q-learning, it does not suffer from high oscillations due to the off-policy problem when using function approximators. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm which is on-policy is convergent. Numerical results on a multi-stage stochastic shortest path problem show that our algorithm exhibits significantly better performance and is more robust as compared to Q-learning. Future work would be to compare it with other policy-based reinforcement learning algorithms. Finally, we develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process(MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multistage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G25592	en_US
dc.subject	Stochastic Approximation Algorithms	en_US
dc.subject	Stochastic Optimization	en_US
dc.subject	Markov Decision Process	en_US
dc.subject	Reinforcement Learning Algorithm	en_US
dc.subject	Queueing Networks	en_US
dc.subject	Queuing Theory	en_US
dc.subject	Quasi-Newton Stochastic Approximation Algorithm	en_US
dc.subject	Online Q-Learning Algorithm	en_US
dc.subject	Online Actor-Critic Algorithm	en_US
dc.subject	Markov Decision Processes	en_US
dc.subject	Q-learning Algorithm	en_US
dc.subject	Linear Function Approximation	en_US
dc.subject	Quasi-Newton Smoothed Functional Algorithms	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Online Learning and Simulation Based Algorithms for Stochastic Optimization	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G25592.pdf
Size:: 1.155Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [394]

Show simple item record

Online Learning and Simulation Based Algorithms for Stochastic Optimization

Files in this item

This item appears in the following Collection(s)

Related items

Image Reconstruction Based On Hilbert And Hybrid Filtered Algorithms With Inverse Distance Weight And No Backprojection Weight ﻿

Performance Evaluation Of Fan-beam And Cone-beam Reconstruction Algorithms With No Backprojection Weight On Truncated Data Problems ﻿

Generalizations Of The Quantum Search Algorithm ﻿

Image Reconstruction Based On Hilbert And Hybrid Filtered Algorithms With Inverse Distance Weight And No Backprojection Weight

Performance Evaluation Of Fan-beam And Cone-beam Reconstruction Algorithms With No Backprojection Weight On Truncated Data Problems

Generalizations Of The Quantum Search Algorithm