Ergodic and adaptive control of markov processes.
Abstract
We address the problem of controlling two classes of Markov processes, viz., Markov chains and diffusion processes. Our main goal is to minimize a long-run average cost criterion (or ergodic cost criterion) associated with the processes.
We begin with controlled Markov chains.
Let {X?} be a controlled Markov chain with state space S = {0, 1, 2, ...}, and action space A, a compact metric space. A control strategy or policy is any rule for choosing actions based on past observations. A stationary policy is a function f: S ? A. If the chain is ergodic under a stationary policy, it is called a stable stationary policy. If X? denotes the state of the process at time n and u? the action chosen at n under some policy, we denote:
p(i, j, u) = P(X??? = j | X? = i, u? = u)
Let k: S × A ? ? be the cost function. Our objective is to almost surely minimize:
lim?N??1N?n=0N?1k(Xn,un)\lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} k(X_n, u_n)N??lim?N1?n=0?N?1?k(Xn?,un?)
over all policies. If the minimum is obtained for a policy, it is called an optimal policy. We show that under a certain condition on the cost function for penalizing unstable behavior, or under some blanket stability condition, optimal stable stationary controls can be characterized via dynamic programming. In other words, we derive a necessary and sufficient condition for optimality.
We use the conventional "vanishing discount" approach, i.e., first treat the related discounted cost criterion, viz., minimize over all policies:
E[?n=0??nk(Xn,un)],0<?<1\mathbb{E} \left[ \sum_{n=0}^{\infty} \beta^n k(X_n, u_n) \right], \quad 0 < \beta < 1E[n=0????nk(Xn?,un?)],0<?<1
and then study the limiting case as ? ? 1 to derive the results for the average cost criterion.
We next consider the non-Bayesian adaptive control of Markov chains. The transition probabilities p(i, j, u; ?) now depend on an unknown parameter ? ? D, where D is a compact metric space. The 'true' value of ? is ??. The cost criterion is again long-run average cost. A direct approach to minimization is not possible as the structure of the system is not completely known. The model set is given in a parametric form. We follow the well-known self-tuning approach to treat this situation. We implement an ad hoc separation between estimation and control aspects: at each time, the control used is the one which would have been the optimal choice if the current parameter estimate, obtained by a maximum likelihood scheme, were the true parameter. We establish the asymptotic optimality of this scheme under certain conditions.
We then turn our attention to the controlled nondegenerate diffusion processes. We consider process X(t) governed by a controlled stochastic differential equation of Itô type:
dX(t)=m(X(t),u(t))dt+a(X(t))dW(t),X(0)=x0dX(t) = m(X(t), u(t))dt + a(X(t))dW(t), \quad X(0) = x_0dX(t)=m(X(t),u(t))dt+a(X(t))dW(t),X(0)=x0?
where m: ?? × U ? ?? and a: ?? ? ?^{d×d} are respectively the drift vector and diffusion matrix. W(t) is a standard d-dimensional Wiener process, and u(t) is a non-anticipative control process taking values in a compact metric space U. If u(t) = v(X(t)) for a measurable v: ?? ? U, we call it a Markov control. If this process X(t) is ergodic under a Markov control, it is called a stable Markov control.
Let k: ?? × U ? ? be the cost function. We want to almost surely minimize:
lim?T??1T?0Tk(X(t),u(t))dt\lim_{T \to \infty} \frac{1}{T} \int_0^T k(X(t), u(t)) dtT??lim?T1??0T?k(X(t),u(t))dt
over all admissible controls. If the minimum is obtained for some control u(·), it is called an optimal control. As in the case of controlled Markov chains, we show that under some condition on the cost function for penalizing unstable behavior or under some stability condition, optimal controls (stable Markov) can be characterized via dynamic programming.
Again, we use the conventional vanishing discount approach. For ? > 0, we first minimize:
E[?0?e??tk(X(t),u(t))dt]\mathbb{E} \left[ \int_0^\infty e^{-?t} k(X(t), u(t)) dt \right]E[?0??e??tk(X(t),u(t))dt]
over all admissible controls and study the situation as ? ? 0. The limiting argument is heavily based on the theory of quasilinear uniformly elliptic equations.
We next treat the adaptive control of diffusion processes. Here the drift depends on an unknown parameter. As in the case of controlled Markov chains, we develop a self-tuning regulator to deal with the situation and establish its asymptotic optimality.
At the end, we briefly discuss the situation when the diffusion is degenerate. We offer some partial answers in this case.
Collections
- Mathematics (MA) [188]

