Simulation Based Algorithms For Markov Decision Process And Stochastic Optimization

Abdulla, Mohammed Shahid

dc.contributor.advisor	Bhatnagar, Shalabh
dc.contributor.author	Abdulla, Mohammed Shahid
dc.date.accessioned	2010-08-06T07:20:35Z
dc.date.accessioned	2018-07-31T04:39:51Z
dc.date.available	2010-08-06T07:20:35Z
dc.date.available	2018-07-31T04:39:51Z
dc.date.issued	2010-08-06
dc.date.submitted	2008
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/812
dc.description.abstract	In Chapter 2, we propose several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes (MDPs) with finite state-space under the average cost criterion. On the slower timescale, all the algorithms perform a gradient search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and averaged for enhanced performance. A proof of convergence to a locally optimal policy is presented. Next, a memory efficient implementation using a feature-vector representation of the state-space and TD (0) learning along the faster timescale is discussed. A three-timescale simulation based algorithm for solution of infinite horizon discounted-cost MDPs via the Value Iteration approach is also proposed. An approximation of the Dynamic Programming operator T is applied to the value function iterates. A sketch of convergence explaining the dynamics of the algorithm using associated ODEs is presented. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are presented using the proposed algorithms. Next, in Chapter 3, we develop three simulation-based algorithms for finite-horizon MDPs (FHMDPs). The first algorithm is developed for finite state and compact action spaces while the other two are for finite state and finite action spaces. Convergence analysis is briefly sketched. We then concentrate on methods to mitigate the curse of dimensionality that affects FH-MDPs severely, as there is one probability transition matrix per stage. Two parametrized actor-critic algorithms for FHMDPs with compact action sets are proposed, the ‘critic’ in both algorithms learning the policy gradient. We show w.p1convergence to a set with the necessary condition for constrained optima. Further, a third algorithm for stochastic control of stopping time processes is presented. Numerical experiments with the proposed finite-horizon algorithms are shown for a problem of flow control in communication networks. Towards stochastic optimization, in Chapter 4, we propose five algorithms which are variants of SPSA. The original one measurement SPSA uses an estimate of the gradient of objective function L containing an additional bias term not seen in two-measurement SPSA. We propose a one-measurement algorithm that eliminates this bias, and has asymptotic convergence properties making for easier comparison with the two-measurement SPSA. The algorithm, under certain conditions, outperforms both forms of SPSA with the only overhead being the storage of a single measurement. We also propose a similar algorithm that uses perturbations obtained from normalized Hadamard matrices. The convergence w.p.1 of both algorithms is established. We extend measurement reuse to design three second-order SPSA algorithms, sketch the convergence analysis and present simulation results on an illustrative minimization problem. We then propose several stochastic approximation implementations for related algorithms in flow-control of communication networks, beginning with a discrete-time implementation of Kelly’s primal flow-control algorithm. Convergence with probability1 is shown, even in the presence of communication delays and stochastic effects seen in link congestion indications. Two relevant enhancements are then pursued :a) an implementation of the primal algorithm using second-order information, and b) an implementation where edge-routers rectify misbehaving flows. Also, discrete-time implementations of Kelly’s dual algorithm and primal-dual algorithm are proposed. Simulation results a) verifying the proposed algorithms and, b) comparing stability properties with an algorithm in the literature are presented.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G22213	en_US
dc.subject	Markov Processes - Data Processing	en_US
dc.subject	Algorithms	en_US
dc.subject	Simulation	en_US
dc.subject	Markov Decision Processes (MDPs)	en_US
dc.subject	Infinite Horizon Markov Decision Processes	en_US
dc.subject	Finite Horizon Markov Decision Processes	en_US
dc.subject	Stochastic Approximation - Algorithms	en_US
dc.subject	Simultaneous Perturbation Stochastic Approximation (SPSA)	en_US
dc.subject	Network Flow-Control	en_US
dc.subject	FH-MDP Algorithms	en_US
dc.subject	Stochastic Optimization	en_US
dc.subject	Reinforcement Learning Algorithms	en_US
dc.subject.classification	Computational Mathematics	en_US
dc.title	Simulation Based Algorithms For Markov Decision Process And Stochastic Optimization	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.discipline	Faculty of Engineering	en_US