Ergodic and adaptive control of markov processes.

Ghosh, Mrinal Kanti

dc.contributor.advisor	Borkar, Vivek S
dc.contributor.author	Ghosh, Mrinal Kanti
dc.date.accessioned	2025-11-06T07:20:31Z
dc.date.available	2025-11-06T07:20:31Z
dc.date.submitted	1988
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/7348
dc.description.abstract	We address the problem of controlling two classes of Markov processes, viz., Markov chains and diffusion processes. Our main goal is to minimize a long-run average cost criterion (or ergodic cost criterion) associated with the processes. We begin with controlled Markov chains. Let {X?} be a controlled Markov chain with state space S = {0, 1, 2, ...}, and action space A, a compact metric space. A control strategy or policy is any rule for choosing actions based on past observations. A stationary policy is a function f: S ? A. If the chain is ergodic under a stationary policy, it is called a stable stationary policy. If X? denotes the state of the process at time n and u? the action chosen at n under some policy, we denote: p(i, j, u) = P(X??? = j \| X? = i, u? = u) Let k: S × A ? ? be the cost function. Our objective is to almost surely minimize: lim?N??1N?n=0N?1k(Xn,un)\lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} k(X_n, u_n)N??lim?N1?n=0?N?1?k(Xn?,un?) over all policies. If the minimum is obtained for a policy, it is called an optimal policy. We show that under a certain condition on the cost function for penalizing unstable behavior, or under some blanket stability condition, optimal stable stationary controls can be characterized via dynamic programming. In other words, we derive a necessary and sufficient condition for optimality. We use the conventional "vanishing discount" approach, i.e., first treat the related discounted cost criterion, viz., minimize over all policies: E[?n=0??nk(Xn,un)],0<?<1\mathbb{E} \left[ \sum_{n=0}^{\infty} \beta^n k(X_n, u_n) \right], \quad 0 < \beta < 1E[n=0????nk(Xn?,un?)],0<?<1 and then study the limiting case as ? ? 1 to derive the results for the average cost criterion. We next consider the non-Bayesian adaptive control of Markov chains. The transition probabilities p(i, j, u; ?) now depend on an unknown parameter ? ? D, where D is a compact metric space. The 'true' value of ? is ??. The cost criterion is again long-run average cost. A direct approach to minimization is not possible as the structure of the system is not completely known. The model set is given in a parametric form. We follow the well-known self-tuning approach to treat this situation. We implement an ad hoc separation between estimation and control aspects: at each time, the control used is the one which would have been the optimal choice if the current parameter estimate, obtained by a maximum likelihood scheme, were the true parameter. We establish the asymptotic optimality of this scheme under certain conditions. We then turn our attention to the controlled nondegenerate diffusion processes. We consider process X(t) governed by a controlled stochastic differential equation of Itô type: dX(t)=m(X(t),u(t))dt+a(X(t))dW(t),X(0)=x0dX(t) = m(X(t), u(t))dt + a(X(t))dW(t), \quad X(0) = x_0dX(t)=m(X(t),u(t))dt+a(X(t))dW(t),X(0)=x0? where m: ?? × U ? ?? and a: ?? ? ?^{d×d} are respectively the drift vector and diffusion matrix. W(t) is a standard d-dimensional Wiener process, and u(t) is a non-anticipative control process taking values in a compact metric space U. If u(t) = v(X(t)) for a measurable v: ?? ? U, we call it a Markov control. If this process X(t) is ergodic under a Markov control, it is called a stable Markov control. Let k: ?? × U ? ? be the cost function. We want to almost surely minimize: lim?T??1T?0Tk(X(t),u(t))dt\lim_{T \to \infty} \frac{1}{T} \int_0^T k(X(t), u(t)) dtT??lim?T1??0T?k(X(t),u(t))dt over all admissible controls. If the minimum is obtained for some control u(·), it is called an optimal control. As in the case of controlled Markov chains, we show that under some condition on the cost function for penalizing unstable behavior or under some stability condition, optimal controls (stable Markov) can be characterized via dynamic programming. Again, we use the conventional vanishing discount approach. For ? > 0, we first minimize: E[?0?e??tk(X(t),u(t))dt]\mathbb{E} \left[ \int_0^\infty e^{-?t} k(X(t), u(t)) dt \right]E[?0??e??tk(X(t),u(t))dt] over all admissible controls and study the situation as ? ? 0. The limiting argument is heavily based on the theory of quasilinear uniformly elliptic equations. We next treat the adaptive control of diffusion processes. Here the drift depends on an unknown parameter. As in the case of controlled Markov chains, we develop a self-tuning regulator to deal with the situation and establish its asymptotic optimality. At the end, we briefly discuss the situation when the diffusion is degenerate. We offer some partial answers in this case.
dc.language.iso	en_US
dc.relation.ispartofseries	T02661
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subject	Vanishing Discount Approach
dc.subject	Stochastic Differential Equations
dc.subject	Ergodic Control
dc.title	Ergodic and adaptive control of markov processes.
dc.degree.name	PhD
dc.degree.level	Doctoral
dc.degree.grantor	Indian Institute of Science
dc.degree.discipline	Science

Files in this item

Name:: T02661.pdf
Size:: 50.81Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Mathematics (MA) [188]

Show simple item record