Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning

Karmakar, Prasenjit

View/Open

Thesis full text (1.214Mb)

Author

Karmakar, Prasenjit

Metadata

Show full item record

Abstract

Stochastic approximation algorithms are sequential non-parametric methods for finding a zero or minimum of a function in the situation where only the noisy observations of the function values are available. Two time-scale stochastic approximation algorithms consist of two coupled recursions which are updated with different (one is considerably smaller than the other) step sizes which in turn facilitate convergence for such algorithms. We present for the first time an asymptotic convergence analysis of two time- scale stochastic approximation driven by 'controlled' Markov noise. In particular, the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are de fined in terms of the ergodic occupation measures associated with the controlled Markov processes. Using a special case of our results, we present a solution to the o -policy convergence problem for temporal-difference learning with linear function approximation. One of the important assumption in the earlier analysis is the point-wise boundedness (also called the 'stability') of the iterates. However, finding sufficient veri able conditions for this is very hard when the noise is Markov as well as when there are multiple timescales. We compile several aspects of the dynamics of stochastic approximation algorithms with Markov iterate dependent noise when the iterates are not known to be stable beforehand. We achieve the same by extending the lock-in probability (i.e. the probability of convergence to a specific attractor of the limiting o.d.e. given that the iterates are in its domain of attraction after a sufficiently large number of iterations (say) n0) framework to such recursions. Specifically, with the more restrictive assumption of Markov iterate-dependent noise supported on a bounded subset of the Euclidean space we give a lower bound for the lock- in probability. We use these results to prove almost sure convergence of the iterates to the specified attractor when the iterates satisfy an `asymptotic tightness' condition. This, in turn, is shown to be useful in analyzing the tracking ability of general 'adaptive' algorithms. Additionally, we show that our results can be used to derive a sample complexity estimate of such recursions, which then can be used for step-size selection. Finally, we obtain the first informative error bounds on function approximation for the policy evaluation algorithm proposed by Basu et al. when the aim is to nd the risk-sensitive cost represented using exponential utility. We also give examples where all our bounds achieve the \actual error" whereas the earlier bound given by Basu et al. is much weaker in comparison. We show that this happens due to the absence of difference term in the earlier bound which is always present in all our bounds when the state space is large. Additionally, we discuss how all our bounds compare with each other

URI

https://etd.iisc.ac.in/handle/2005/5026

Collections

Computer Science and Automation (CSA) [396]