Optimal control and differential games in a Hilbert space
Abstract
In this thesis, we study some aspects of optimal control and differential game problems in an infinite-dimensional Hilbert space. We mainly discuss the dynamic programming and viscosity solution approach to these problems. There are five chapters in the thesis.
Chapter 1: We give a brief introduction to optimal control and differential game problems associated with semilinear evolution equations in a Hilbert space. The relevant notions of viscosity solutions of Hamilton–Jacobi equations (with unbounded linear terms) in a Hilbert space are also introduced in this chapter.
Chapter 2: We study optimal control of infinite-dimensional systems governed by continuous, switching, and impulse controls.
Let the Hilbert space EEE be the state space, the compact metric space UUU be the continuous control set, A={1,2,…,m}A = \{1, 2, \dots, m\}A={1,2,…,m} be the switching control set, and the compact set K?EK \subset EK?E be the impulse control set. The continuous control space UUU, the switching control space AAA, and the impulse control space KKK are defined as follows:
• U={u(?)?u(?):[0,?) measurable}U = \{u(\cdot) \mid u(\cdot): [0,\infty) \text{ measurable}\}U={u(?)?u(?):[0,?) measurable}
• A={d(?)?d(0)=d,(?i)?[0,?),?i??}A = \{d(\cdot) \mid d(0)=d, (\theta_i)\subset[0,\infty), \theta_i \uparrow \infty\}A={d(?)?d(0)=d,(?i)?[0,?),?i??}
• K={?(?)=?1[Ti,?)(?)hi?(hi)?K,(Ti)?[0,?),Ti??}K = \{\eta(\cdot) = \sum \mathbf{1}_{[T_i,\infty)}(\cdot) h_i \mid (h_i)\subset K, (T_i)\subset[0,\infty), T_i \uparrow \infty\}K={?(?)=?1[Ti,?)(?)hi?(hi)?K,(Ti)?[0,?),Ti??}
Here 1[a,b)\mathbf{1}_{[a,b)}1[a,b) denotes the characteristic function of the interval [a,b)[a,b)[a,b).
The switching control space A=?d?AAdA = \bigcup_{d\in A} A^dA=?d?AAd, where AdA^dAd is the set of all switching controls d(?)d(\cdot)d(?) with d(0)=dd(0)=dd(0)=d.
Let ?A-A?A be the generator of a contraction semigroup S(t)S(t)S(t) on EEE and f:E×U×A?Ef:E\times U\times A \to Ef:E×U×A?E. The state equation is:
x(t)=S(t)x+?0tS(t?s)f(x(s),u(s),d(s))?ds+?(t).x(t) = S(t)x + \int_0^t S(t-s) f(x(s), u(s), d(s))\, ds + \eta(t).x(t)=S(t)x+?0tS(t?s)f(x(s),u(s),d(s))ds+?(t).
A mild solution of the above equation is denoted by ?(?)\varphi(\cdot)?(?).
The cost functional is defined as:
J=?0?e??sc(x(s),u(s),d(s))?ds+?i?1e???ik(di?1,di)+?i?1e??Ti?(hi),J = \int_0^\infty e^{-\lambda s} c(x(s), u(s), d(s))\, ds + \sum_{i\ge 1} e^{-\lambda \theta_i} k(d_{i-1}, d_i) + \sum_{i\ge 1} e^{-\lambda T_i} \ell(h_i),J=?0?e??sc(x(s),u(s),d(s))ds+i?1?e???ik(di?1,di)+i?1?e??Ti?(hi),
where ?>0\lambda > 0?>0 is the discount factor, ccc is the running cost, kkk is the switching cost, and ?\ell? is the impulse cost.
The value function is W=(W1,…,Wm)\mathbf{W} = (W^1, \dots, W^m)W=(W1,…,Wm), where for each d?Ad \in Ad?A:
Wd(x)=inf?u,d,?J(x;u,d,?).W^d(x) = \inf_{u,d,\eta} J(x; u,d,\eta).Wd(x)=u,d,?infJ(x;u,d,?).
If W\mathbf{W}W is smooth enough, then it satisfies the following system of quasi-variational inequalities in the classical sense:
\max{\lambda W^d(x) + \langle D W^d(x), Ax\rangle + H^d(x, D W^d(x)) - \mathcal{M}^d\mathbf{W}, W^d(x) - \mathcal{N}[W^d} = 0,\ d=1,2,\dots,m, ] where:
Hd(x,p)=sup?u?U[??p,f(x,u,d)??c(x,u,d)],MdW=min?d?[k(d,d?)+Wd?(x)],NWd=min?h?K[?(h)+Wd(x+h)].H^d(x,p) = \sup_{u\in U}[-\langle p, f(x,u,d)\rangle - c(x,u,d)],\quad \mathcal{M}^d\mathbf{W} = \min_{d'}[k(d,d') + W^{d'}(x)],\quad \mathcal{N}W^d = \min_{h\in K}[\ell(h) + W^d(x+h)].Hd(x,p)=u?Usup[??p,f(x,u,d)??c(x,u,d)],MdW=d?min[k(d,d?)+Wd?(x)],NWd=h?Kmin[?(h)+Wd(x+h)].
Let B=(I+AA?)?1B = (I + AA^*)^{-1}B=(I+AA?)?1, and define ?x?b=?Bx,x?|x|_b = \sqrt{\langle Bx, x\rangle}?x?b=?Bx,x?.
Let F\mathcal{F}F be the set of all bounded functions W:E?Rm\mathbf{W}:E\to \mathbb{R}^mW:E?Rm such that there exists a local modulus ?(?)\alpha(\cdot)?(?) satisfying:
?Wd(x)?Wd(x?)???(?x?x??b), ?x,x??E.|W^d(x) - W^d(\bar{x})| \le \alpha(|x-\bar{x}|_b),\ \forall x,\bar{x}\in E.?Wd(x)?Wd(x?)???(?x?x??b), ?x,x??E.
Theorem (Main result of Chapter 2):
The value function W\mathbf{W}W is the unique viscosity solution of the QVI system in the class F\mathcal{F}F.
Chapter 3: We study fixed-duration differential games (two-person zero-sum) associated with semilinear evolution equations in a Hilbert space by taking strategies and payoff in the sense of Berkovitz.
Let T>0T > 0T>0 be the duration of the game. The state space EEE and AAA are as in Chapter 2. Let UUU and VVV be control sets for Players I and II respectively. For 0?s?t?T0 \le s \le t \le T0?s?t?T, define:
U[s,t]={u(?):[s,t]?U measurable},V[s,t]={v(?):[s,t]?V measurable}.\mathcal{U}[s,t] = \{u(\cdot):[s,t]\to U\ \text{measurable}\},\quad \mathcal{V}[s,t] = \{v(\cdot):[s,t]\to V\ \text{measurable}\}.U[s,t]={u(?):[s,t]?U measurable},V[s,t]={v(?):[s,t]?V measurable}.
The state x(?)x(\cdot)x(?) is governed by:
x?(t)+Ax(t)=f(t,x(t),u(t),v(t)),t0<t<T, x(t0)=x0.x'(t) + A x(t) = f(t,x(t),u(t),v(t)),\quad t_0 < t < T,\ x(t_0)=x_0.x?(t)+Ax(t)=f(t,x(t),u(t),v(t)),t0<t<T, x(t0)=x0.
A mild solution is denoted by ?(?;t0,x0,u(?),v(?))\varphi(\cdot;t_0,x_0,u(\cdot),v(\cdot))?(?;t0,x0,u(?),v(?)).
Without loss of generality, assume no running payoff; let g:E?Rg:E\to \mathbb{R}g:E?R be the terminal payoff.
Strategies for players are defined as sequences of partitions and mappings (Berkovitz strategies), which determine control functions on subintervals. Under suitable assumptions, the sequence of stage trajectories is relatively compact in C([t0,T];E)C([t_0,T];E)C([t0,T];E). A uniform limit of a subsequence is called a motion and is denoted by:
?(?;t0,x0,T,A).\varphi(\cdot;t_0,x_0,\mathcal{T},\mathcal{A}).?(?;t0,x0,T,A).
We suppress the dependence of the sequence of partitions on a strategy and, by an abuse of notation, denote a strategy by T\mathcal{T}T or A\mathcal{A}A. In what follows, T\mathcal{T}T stands for a strategy for Player I and A\mathcal{A}A stands for a strategy for Player II.
The pair (Tn,An)(\mathcal{T}_n, \mathcal{A}_n)(Tn,An) of nnn-th stage strategies uniquely determines a pair (un(?),vn(?))?U[t0,T]×V[t0,T](u_n(\cdot), v_n(\cdot)) \in \mathcal{U}[t_0,T] \times \mathcal{V}[t_0,T](un(?),vn(?))?U[t0,T]×V[t0,T]. The pair (un(?),vn(?))(u_n(\cdot), v_n(\cdot))(un(?),vn(?)) determined this way is called the nnn-th stage outcome of the pair (T,A)(\mathcal{T}, \mathcal{A})(T,A) of strategies.
Now let {x0n}\{x_{0n}\}{x0n} be a sequence converging to x0x_0x0. For each nnn, we have the nnn-th stage trajectory
?(?;t0,x0n,un(?),vn(?)),\varphi(\cdot; t_0, x_{0n}, u_n(\cdot), v_n(\cdot)),?(?;t0,x0n,un(?),vn(?)),
which is the trajectory corresponding to the nnn-th stage outcome (un(?),vn(?))(u_n(\cdot), v_n(\cdot))(un(?),vn(?)) with ?(t0)=x0n\varphi(t_0) = x_{0n}?(t0)=x0n.
Under suitable assumptions, we get the relative compactness of the sequence of the nnn-th stage trajectories in C([t0,T];E)C([t_0,T];E)C([t0,T];E). A uniform limit of a subsequence of the sequence of nnn-th stage trajectories is called a motion and is denoted by
?(?;t0,x0,T,A).\varphi(\cdot; t_0, x_0, \mathcal{T}, \mathcal{A}).?(?;t0,x0,T,A).
The set of all motions corresponding to (T,A)(\mathcal{T}, \mathcal{A})(T,A) is denoted by ?[t0,x0,T,A]\Phi[t_0, x_0, \mathcal{T}, \mathcal{A}]?[t0,x0,T,A]. By ?[t0,x0,T,A]\Phi[t_0, x_0, T, A]?[t0,x0,T,A], we mean the set of all ?(?;t0,x0,T,A)\varphi(\cdot; t_0, x_0, \mathcal{T}, \mathcal{A})?(?;t0,x0,T,A) when (T,A)(\mathcal{T}, \mathcal{A})(T,A) runs over all strategies.
The payoff functional is set-valued and is defined as
P(t0,x0,T,A)=g(?(T;t0,x0,T,A)),P(t_0, x_0, \mathcal{T}, \mathcal{A}) = g(\varphi(T; t_0, x_0, \mathcal{T}, \mathcal{A})),P(t0,x0,T,A)=g(?(T;t0,x0,T,A)),
where g:E?Rg:E \to \mathbb{R}g:E?R is the terminal payoff function.
Player I tries to choose T\mathcal{T}T so as to maximize all elements of P(t0,x0,T,A)P(t_0, x_0, \mathcal{T}, \mathcal{A})P(t0,x0,T,A), whereas Player II tries to choose A\mathcal{A}A so as to minimize all elements of P(t0,x0,T,A)P(t_0, x_0, \mathcal{T}, \mathcal{A})P(t0,x0,T,A).
The upper value function and the lower value function are defined as follows:
W+(t0,x0)=inf?Asup?TP(t0,x0,T,A),W?(t0,x0)=sup?Tinf?AP(t0,x0,T,A).W^+(t_0, x_0) = \inf_{\mathcal{A}} \sup_{\mathcal{T}} P(t_0, x_0, \mathcal{T}, \mathcal{A}), \qquad W^-(t_0, x_0) = \sup_{\mathcal{T}} \inf_{\mathcal{A}} P(t_0, x_0, \mathcal{T}, \mathcal{A}).W+(t0,x0)=AinfTsupP(t0,x0,T,A),W?(t0,x0)=TsupAinfP(t0,x0,T,A).
Clearly, W+?W?W^+ \ge W^-W+?W?. If W+=W?=WW^+ = W^- = WW+=W?=W, then we say that the differential game has a value, and WWW is referred to as the value function.
A pair (T?,A?)(\mathcal{T}^*, \mathcal{A}^*)(T?,A?) is said to constitute a saddle point for the game with initial point (t0,x0)(t_0, x_0)(t0,x0) if for all T\mathcal{T}T and A\mathcal{A}A, the following holds:
P(t0,x0,T,A?)?P(t0,x0,T?,A?)?P(t0,x0,T?,A).P(t_0, x_0, \mathcal{T}, \mathcal{A}^*) \le P(t_0, x_0, \mathcal{T}^*, \mathcal{A}^*) \le P(t_0, x_0, \mathcal{T}^*, \mathcal{A}).P(t0,x0,T,A?)?P(t0,x0,T?,A?)?P(t0,x0,T?,A).
The main result of this chapter is the following dynamic programming type of inequalities in Berkovitz’s setting:
Theorem: For (t0,x0)?[0,T)×E(t_0, x_0) \in [0,T) \times E(t0,x0)?[0,T)×E and t0<t1<Tt_0 < t_1 < Tt0<t1<T,
W?(t0,x0)?sup?Tinf?AW?(t1,?(t1;t0,x0,T,A)),W^-(t_0, x_0) \ge \sup_{\mathcal{T}} \inf_{\mathcal{A}} W^-(t_1, \varphi(t_1; t_0, x_0, \mathcal{T}, \mathcal{A})), W?(t0,x0)?TsupAinfW?(t1,?(t1;t0,x0,T,A)),
W+(t0,x0)?inf?Asup?TW+(t1,?(t1;t0,x0,T,A)).W^+(t_0, x_0) \le \inf_{\mathcal{A}} \sup_{\mathcal{T}} W^+(t_1, \varphi(t_1; t_0, x_0, \mathcal{T}, \mathcal{A})).W+(t0,x0)?AinfTsupW+(t1,?(t1;t0,x0,T,A)).
Here T\mathcal{T}T and A\mathcal{A}A denote constant strategies for Players I and II respectively. Using these inequalities, under the Isaacs min-max condition, we show that the differential game has a value and this value function is the unique viscosity solution of the associated Hamilton–Jacobi–Isaacs (HJI) equation.
As in the finite-dimensional problem, we also construct a saddle point for the game using the extremal strategies.
Chapter 4: Motivated by the developments in the previous chapter, we define strategies and payoff for infinite horizon discounted problems whose state is governed by a controlled semilinear evolution equation in a Hilbert space. In this setup, we first prove dynamic programming inequalities. Using these inequalities, we show the existence of value and then characterize it as the unique viscosity solution of the associated HJI equation. A saddle point for the game is also constructed by properly modifying the ideas in the previous chapter.
Chapter 5: We introduce a new method for constructing optimal strategies for differential games in Berkovitz’s setting. We use the regularization of the value function using inf/sup convolutions to construct a saddle point in feedback form. This method is shown to be suitable for fixed-duration differential games, infinite horizon differential games with discounted payoff, and for differential games with ergodic payoff.
Collections
- Mathematics (MA) [239]

