Show simple item record

dc.contributor.advisorBhatnagar, Shalabh
dc.contributor.authorKarmakar, Prasenjit
dc.date.accessioned2021-03-30T06:57:01Z
dc.date.available2021-03-30T06:57:01Z
dc.date.submitted2018
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5026
dc.description.abstractStochastic approximation algorithms are sequential non-parametric methods for finding a zero or minimum of a function in the situation where only the noisy observations of the function values are available. Two time-scale stochastic approximation algorithms consist of two coupled recursions which are updated with different (one is considerably smaller than the other) step sizes which in turn facilitate convergence for such algorithms. We present for the first time an asymptotic convergence analysis of two time- scale stochastic approximation driven by 'controlled' Markov noise. In particular, the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are de fined in terms of the ergodic occupation measures associated with the controlled Markov processes. Using a special case of our results, we present a solution to the o -policy convergence problem for temporal-difference learning with linear function approximation. One of the important assumption in the earlier analysis is the point-wise boundedness (also called the 'stability') of the iterates. However, finding sufficient veri able conditions for this is very hard when the noise is Markov as well as when there are multiple timescales. We compile several aspects of the dynamics of stochastic approximation algorithms with Markov iterate dependent noise when the iterates are not known to be stable beforehand. We achieve the same by extending the lock-in probability (i.e. the probability of convergence to a specific attractor of the limiting o.d.e. given that the iterates are in its domain of attraction after a sufficiently large number of iterations (say) n0) framework to such recursions. Specifically, with the more restrictive assumption of Markov iterate-dependent noise supported on a bounded subset of the Euclidean space we give a lower bound for the lock- in probability. We use these results to prove almost sure convergence of the iterates to the specified attractor when the iterates satisfy an `asymptotic tightness' condition. This, in turn, is shown to be useful in analyzing the tracking ability of general 'adaptive' algorithms. Additionally, we show that our results can be used to derive a sample complexity estimate of such recursions, which then can be used for step-size selection. Finally, we obtain the first informative error bounds on function approximation for the policy evaluation algorithm proposed by Basu et al. when the aim is to nd the risk-sensitive cost represented using exponential utility. We also give examples where all our bounds achieve the \actual error" whereas the earlier bound given by Basu et al. is much weaker in comparison. We show that this happens due to the absence of difference term in the earlier bound which is always present in all our bounds when the state space is large. Additionally, we discuss how all our bounds compare with each otheren_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;G29822
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectStochastic approximation algorithmsen_US
dc.subjectMarkov noiseen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titleStochastic Approximation with Markov Noise: Analysis and applications in reinforcement learningen_US
dc.typeThesisen_US
dc.degree.namePhDen_US
dc.degree.levelDoctoralen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record