• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning

    View/Open
    Thesis full text (1.214Mb)
    Author
    Karmakar, Prasenjit
    Metadata
    Show full item record
    Abstract
    Stochastic approximation algorithms are sequential non-parametric methods for finding a zero or minimum of a function in the situation where only the noisy observations of the function values are available. Two time-scale stochastic approximation algorithms consist of two coupled recursions which are updated with different (one is considerably smaller than the other) step sizes which in turn facilitate convergence for such algorithms. We present for the first time an asymptotic convergence analysis of two time- scale stochastic approximation driven by 'controlled' Markov noise. In particular, the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are de fined in terms of the ergodic occupation measures associated with the controlled Markov processes. Using a special case of our results, we present a solution to the o -policy convergence problem for temporal-difference learning with linear function approximation. One of the important assumption in the earlier analysis is the point-wise boundedness (also called the 'stability') of the iterates. However, finding sufficient veri able conditions for this is very hard when the noise is Markov as well as when there are multiple timescales. We compile several aspects of the dynamics of stochastic approximation algorithms with Markov iterate dependent noise when the iterates are not known to be stable beforehand. We achieve the same by extending the lock-in probability (i.e. the probability of convergence to a specific attractor of the limiting o.d.e. given that the iterates are in its domain of attraction after a sufficiently large number of iterations (say) n0) framework to such recursions. Specifically, with the more restrictive assumption of Markov iterate-dependent noise supported on a bounded subset of the Euclidean space we give a lower bound for the lock- in probability. We use these results to prove almost sure convergence of the iterates to the specified attractor when the iterates satisfy an `asymptotic tightness' condition. This, in turn, is shown to be useful in analyzing the tracking ability of general 'adaptive' algorithms. Additionally, we show that our results can be used to derive a sample complexity estimate of such recursions, which then can be used for step-size selection. Finally, we obtain the first informative error bounds on function approximation for the policy evaluation algorithm proposed by Basu et al. when the aim is to nd the risk-sensitive cost represented using exponential utility. We also give examples where all our bounds achieve the \actual error" whereas the earlier bound given by Basu et al. is much weaker in comparison. We show that this happens due to the absence of difference term in the earlier bound which is always present in all our bounds when the state space is large. Additionally, we discuss how all our bounds compare with each other
    URI
    https://etd.iisc.ac.in/handle/2005/5026
    Collections
    • Computer Science and Automation (CSA) [392]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV