Saturday, 16 January 2021

Stochastic Optimal Control: Demonstrating The Dynamic Programming Principle

 

Setting up the problem

Take \(T > 0\) for a maximal time which could be infinite and the following state equation. $$dX(s) = b(s , X(s) , a(s))ds + \sigma(s , X(s) , a(s))dW_s$$ with initial contition \(X(t) = x\). We assume that \(b : [0 , T] \times \mathbb{R}^d \times \Gamma \rightarrow \mathbb{R}^d\) and \(\sigma : [0 , T] \times \mathbb{R}^n \times \Gamma \rightarrow \mathbb{R}^{n \times n} \) we take both functions to be uniformily continuous and bounded of all subsets of \([0,T] \times \mathbb{R}^n\) and uniform for \(a \in \Gamma\). We make explicit some contitions: $$||b(s , x , a) - b(s , y , a)|| \leq C|x-y|; \forall s \in [0,T]; x,y \in \mathbb{R^n} a \in \Gamma$$ and $$||\sigma(s , x, a) - \sigma(s,y,a)|| \leq C|x-y|$$ while $$||b(s , x , a)|| + ||\sigma(s , x , a)|| \le C( 1 + |x|)$$ where \(\Gamma\) is a polish space, hence complete and separable and our \(a( \cdot ) \in \Gamma\) is our control process. We design our cost function to be: $$J(t , x ; a(\cdot)) = \mathbb{E} [\int^T_t e^{-\int^s_t c(X(r))dr}L(s , X(s) , a(s))ds + e^{\int^T_tc(X(r))dr}g(X(T))]$$ where we note that \(c\) has office of discounting function, \(L\) is our running cost and \(g\) is our terminal cost.

We now work on the properties of a Generalized Reference Probability Space (GRPS) $$\mu (\Omega , \mathcal{F} , \mathcal {F}_s^t , \mathbb{P} , W)$$ where \(\Omega , \mathcal{F} , \mathbb{P}\) is a complete probability space, \(\mathcal{F}^t_s\) a right continuous complete filtration, more explicitely, \(\mathcal{F}_{s_1}^t \subset \mathcal{F}_{s_2}^t \text{ for } s_1 \le s_2\), each \(\mathcal{F}^t_s\) contains all \(\mathbb{P}-null\) set of \(\mathcal{F}\) and \(\mathcal{F}_s^t = \cap_{r > s}\mathcal{F}_r^t\). We also have \(W\) a Wiener process such that \(W(t_2) - W(t_1) \perp \mathcal{F}_{t_1}^t , t_2 > t_1\), \(W(t_2) - W(t_1) \sim N(0 , (t_2 - t_1)I)\) and \(W\) has continuous trajectories in \(\mathbb{P}-a.s.\). We define a natural filtration to be \(F_s^{t,0} = \pmb{\sigma}(W(r) : t \le r \le s)\) where here \(\pmb{\sigma}(A)\) is the sigma algebra generated by \(A\) and \(\mathcal{F}_s^t\) is the augmentation including \(\mathbb{P}-null\) sets of \(\mathcal{F}_s^{t,0}\) hence \(\pmb{\sigma}(\mathcal{F}_s^{t,0} , N)\). If \(\mathcal{F}_s^t\) is the natural filtration operated by \(W\) and if addition \(W(t) = 0\), then, a GRPS \(\mu\) is called a Reference Probability Space (RPS)

Formulations of an Optimal Control Problem

We first define an admissible control $$U_t^{\mu} = \{a(\cdot) | [0 , T] \times \Omega \rightarrow \Gamma \text{ st } a(\cdot)\text{ is }\mathcal{F}_s^t-\text{progressively measurable}\}$$ where \(a(\cdot)\) is progressively measurable if for every \(s > t\) such that $$a(\cdot) : [t,s]\times \Omega \rightarrow \Gamma \text{ is } B([t,s])\times \mathbb{F}^t_s \Big/ B(\Gamma)-\text{measurable}$$For the strong formulation we fix a GRPS \(\mu\) with our above admissible control and our goal is to minimize \(J(t,X(s),a(\cdot))\) over all \(a(\cdot) \in U_t^{\mu}\). Now for our weak formulation of optimal control we take for admissible controls \(\bigcup_{\mu}U_t^{\mu}\) the union over all GRPS \(\mu\) which is a larger set than the one for the strong formulation where our goal is to minimize \(J(t , x , a)\) over all \(a(\cdot) \in U_t\). We will note that the proof of the dynamic programing principle is easier to prove in the weak formulation, hence we will consider the weak formulation with \(U_t = \bigcup_{\mu}U_t^{\mu}\) where the union is over all the RPS \{mu}. In this framework we will have an easier time representing the controls as functionals of the Wiener process across reference probability spaces.

We recall that \(X(\cdot)\) is a solution to the state equation is \(X\) is progressively measurable \(\forall s \ge t\) and $$X(s) = x + \int_t^s b(r , X(r) , a(r))dr + \int_t^s\sigma(r , X(r) , a(r))dW(r)$$ where the second integral is is the sense of the Itô. Take the following theorem: If \(\mu\) is GRPS and \(a(\cdot) \in U_t^{\mu}\) then \(\forall t \in [0,T] , x\in \mathbb{R}^n\) state equation has a unique solution \(X(s) = X(s ; t , x , a(\cdot))\) which means solution at time \(s\) with initial condition \(x\) at time \(t\) with control \(a\). Moreover we have:

  • \(X(\cdot)\) has continuous trajectories \(P-a.s.\)
  • \(\mathbb{E}[max_{t \le s \le T}|X(s)|^p] \leq C_p(1+|x|^p)\) for \(p\ge 1\)
  • \(\mathbb{E}[max_{t \le s \le T}|X(s)-Y(s)|^2] \le C|x-y|^2\)
  • \(\mathbb{E}[max_{t \le r \le s}|X(r)-x|^2] \le C(s-t)\)

The solutions can be obtained as fixed point of the map $$K[Y](s) = x + \int_t^s b(r , Y(r) , a(r))dr + \int_t^s\sigma(r , Y(r) , a(r))dW(r)$$ is the space of continuous processes such that \(||Y|| = \mathbb{E}[max_{t \le s \le T}|Y(s)|^p]\)

Dynamic Programming and the Hamilton-Jacobi-Bellman Equation (HJB)

We define the vaue function to be $$V(t,X) = inf_{a(\cdot) \in U_t}J(t , X , a(\cdot))$$ and now state the Dynamic Programming Principle where \(\forall h \text{ s.t. } t\le t+h \le T\) and $$V(t,X) = inf_{a(\cdot) \in U_t}\mathbb{E}[\int_t^{t+h}e^{-\int_t^{s}c(X(r))dr}L(s,X(s) , a(s))ds + e^{-\int_t^{t+h}c(X(r))dr}V(t+h , X(t+h))]$$ with \(X(r) = X(r ; t , x , a)\) as above where we note this formulation has the semblance of a recursion.

We define the non-linear two parameter semigroup also called Nisio semigroup $$T_{t,r}(\psi) = inf_{a(\cdot)\in U_t}\mathbb{E}[\int_t^re^{-\int_t^sc(X(\tau))d\tau}L(s,X(s),a(s))ds + e^{-\int_t^rc(X(\tau))d\tau}\psi(X(r))]$$ which is the right hand side of our dynamic programming principle with the last value function replaced by our \(\psi\) function. We note that the dynamic programming principle implies a semigroup property by: $$T_{tT}=T_{tr}(T_{rT}(\psi))$$ with \(t < r < T\) and the Hamilton-Jacobi equation is the equation for the generator of this semigroup \(T\) which is given by $$u_t + inf_{a \in \Gamma}\{\frac{1}{2}Tr(\sigma(t,x,a)\sigma^t(t,x,a)D^2u) + <b(t,x,a),Du> + c(x)u + L(t,x,a)\} = 0$$ with terminal contition $$u_T(T,x) = g(x) \text{ in } \mathbb{R}^n$$ with \((t,x) \in [0 , T] \times \mathbb{R}^n\) where we rewrite the HJB equation as $$u_t + F(t,x,u,Du,D^2u) = 0$$ and conditons $$u(0,x) = g(x)$$ with $$F:[0,T] \times \mathbb{R}^n \times \mathbb{R} \times \mathbb{R}^n \times S(n) \rightarrow \mathbb{R}^n$$ with \(S(n)\) the group of symmetric n by n matrices. We call the inside of the infimum to be the current value Hamiltonian \(F\) denoted as \(F_{cv}(t,x,u,Du,D^2u,a)\) Note that to formuate the dynamic programming principle we need to know if \(V\) is Borel measurable in the second variable so we need \(J\) to be uniformily continuous.

Verification Theorem and conditions for optimality

We will assume that \(c\) is identically 0 in order to simplify the discounting given that we are working on a finite time problem, we note that discounting is necessary if we do not have bounded time. We will further make precise the conditions of our problem by making \(|L(t,x,a)| + |g(x)| \le C(1+|x|^N)\) for some \(N\) superior or equal to \(0\) which stops our functions from blowing up and \(L(\cdot , \cdot , a)\) uniformily continuous on bounded subsets of \([0,T]\times \mathbb{R}^n\) and uniformily in \(a \in \Gamma\). Let \(u \in C^{1,2}([0,T])\times \mathbb{R}^n\) be a classical solution of the HJB equation such that \(|V_t(t,x)| , |Du(t,x)| , |D^2u(t,x)|,|u(t,x)| \le (1+|x|^N) \text{ , N}\le 0\) then:

  • \(u \le V\) the value function.
  • Take \((X^*(\cdot) , a^*(\cdot))\) be an admisible pair defined which is solution \(X^*\) with control \(a^*\) at \((t,x)\) such that: $$a^*(\cdot) \in argmin_{a\in \Gamma}F(s , X^*(s) , Du(s,X^*(s)) , D^2u(s,X^*(s)), a)$$ for almost every \(s \in [t,T]\) and \(\mathbb{P}\) almost surely then the pair \((X^*(\cdot) , a^*(\cdot))\) is optimal at \((t,x)\) and \(u(t,x) = V(t,x)\). This is a sufficient condition for the admissible control to be optimal.

Proof: If \(a(\cdot) \in (U_t)\) then by the Itô formula $$u(t,x) = \mathbb{E}[u(T,X(T)) - \int_t^T[u_t(s,X(s))]$$ $$+<b(s,X(s) , a(s)),Du(s,X(s))> $$ $$+ \frac{1}{2}Tr(\sigma(s,X(s),a(s))\sigma^t(s,X(s),a(s)D^2u(s,X(s)))ds]$$ $$=\mathbb{E}[g(X(T))-\int_t^T[u_t(s,X(s))-L(s,X(s),a(s))$$ $$+F_{cv}(s,X(s),Du(s,X(s)),D^2u(s,X(s)),a(s)) $$ $$-F( s,X(s),Du(s,X(s)),D^2u(s,X(s)) )+F( s,X(s),Du(s,X(s)),D^2u(s,X(s)) )]ds]$$ Where we note that $$ u_t(s,X(s)) + F( s,X(s),Du(s,X(s)),D^2u(s,X(s)) ) = 0$$by virtue of being a classial solution and $$F_{cv}(s,X(s),Du(s,X(s)),D^2u(s,X(s)),a(s)) - F( s,X(s),Du(s,X(s)),D^2u(s,X(s)) ) \ge 0$$ given \(F\) is the infimum of the current value hamiltonian. Hence we can deduce that our expression is less than or equal to the cost functional with control \(a(\cdot)\):$$u(t,x)\le J(t,X,a(\cdot))$$ with equality if and only if \(F_{cv} = F\). Therefore \(u \le V\) given the value function is the infimum over the cost functional. If \(a(\cdot) = a^*(\cdot)\) then the previous work gives us \(u(t,x) = J(t,x , a^*(\cdot)) \ge V(t,x)\). Hence we can conclude that $$u(t,x) = V(t,x) = J(t,x,a^*(\cdot))$$ and \(a^*(\cdot)\) is indeed an optimal control. This criterion derived from the HJB equation for a control to be optimal is a sufficient condition. this is however also a necessary condition for optimal control.

The verification theorem is however also a necessary condition for optimal control if the value function is smooth. Let \(u = V\), assuming that the value function is a smooth solution to the HJB equation. If \((X^*(\cdot) , a^*(\cdot))\) is an optimal pair at \(t,x\) by which we mean the pair minimizes \(J(\cdot) = V(\cdot)\), then we must have $$a^*(s) \in argmin_{a\in \Gamma}F_{cv}(s,X^*(s) , DV(d,X^*(s)),D^2V(s,X^*(s)), a^*(s))$$ for almost every \(s\in[t,T] \text{ } \mathbb{P}\)-almost surely.

Proof: From our previous work we must have \(F_{cv} = F\) along \(a^*(s)\).

Construction of Optimal Feedback Controls

To construct an optimal feedback control we will need to use our conditions for optimality as previously derived. We consider $$\phi : (0,T) \times \mathbb{R}^n \rightarrow P(\Gamma)$$ where \(P\) denotes the power set. $$\phi:(s,x) \rightarrow argmin_{a\in \Gamma}F_{cv}(t,x,DV(t,x) , ^2V(t,x),a)$$ We then write the closed loop equation (CLE) : $$dX(s) = b(s,X(s) , \phi(s,X(s)))ds + \sigma (s,X(s) , \phi (s,X(s)))dW(s)$$ with conditions \(X(t) = x\).

Suppose that \(\phi\) admits a measurable selection $$\psi _t :(t,T) \times \mathbb{R}^d \rightarrow \Gamma$$ which means that \(\psi_t\) belongs to the set \(\phi (s,x)\) such that the CLE $$dX(s) = b(s,X(s) , \psi _t(s,X(s)))ds + \sigma (s,X(s) , \psi _t(s,X(s)))dW(s)$$ and \(X(t) = x\) has a solution \(X_{\psi_{t}}(\cdot)\) in some GRPS \(\mu\), then the admissible pair \(( X_{\psi_{t}}(\cdot) , a_{\psi_{t}}(\cdot))\) where \(a_{\psi_{t}} = \psi_t(\cdot , X_{\psi_{t}}(\cdot))\) is optimal at \((t,x)\)

Uniqueness in Law

This section concerns the issue of uniqueness in law, which is essential in the considerations of dynamic programming as we work with various probability spaces and we would like to know if an optimal pair in one setting is also valid in another. A key idea is that pathwise uniqueness implies uniqueness in law, where pathwise uniqueness means that if we have a solution to a stochastic differential equation and a fixed RPS we can take two initial conditions that agree almost everywhere, then the paths of the solutions agree almost everywhere. If we have a cost functional \(\mathbb{E}\int_t^T L(s,X(s) , a(s))ds\) with two different controls and perhaps two different solutions on different probability spaces, we need to know that the joint law of the two processes from one is the same as that of the other. So we want to be able to say that if we have two GRPS and if the Wiener process and the control of one probability space has the same law as those of the other probability space then the two pair of solution and control under both probability spaces have the same joint laws.

We first have to recall that the solution to the CLE is continuous but the control is not necessarily. Take two stochastic processes on two probability spaces where \(s \in (t,T)\) and let $$X_i(s) : (\Omega_i , \mathcal{F}_i , \mathbb{P}_i) \rightarrow (\Omega , \mathcal{F})$$ with no assumption on their continuity where \(X_1(\cdot)\) and \(X_2(\cdot)\) have the same finite dimention distribution on \((t,T)\) if there exists a set \(D\) of full measure on \((t,T)\) such that $$\forall t= t_1 < \dots < t_n = T \text{ , } t_i \in D \text{ and } A\in \mathcal{F} \otimes \dots \otimes \mathcal{F}$$ we have the following: $$\mathbb{P}_1(\omega_1 | (X_1(t_1) , \dots , X_1(t_n)) \in A) = \mathbb{P}_2(\omega_2 | (X_2(t_1) , \dots , X_2(t_n)) \in A)$$ We write this as $$\mathcal{L_{\mathbb{P}_1}}(X_1(\cdot)) = \mathcal{L_{\mathbb{P}_2}}(X_2(\cdot))$$ We now present two theorems. Take two GRPS $$\mu_1 = (\Omega_1 , \mathcal{F_1} , \mathcal{F_s^{1,t}} , \mathbb{P_1} , W_1)$$ $$\mu_2 = (\Omega_2 , \mathcal{F_2} , \mathcal{F_s^{2,t}} , \mathbb{P_2} , W_2)$$ and some measurabe space \(( \widetilde{ \Omega} , \widetilde{ \mathcal{F}})\). We define two random varibles:$$\zeta_i : \Omega \rightarrow \widetilde{\Omega} \text{ , } i=1,2, \dots$$ two stochastic processes: $$f_i : [t,T] \times \Omega \rightarrow \mathbb{n} \text{ , } i = 1,2, \dots$$ and two processes $$\phi_i : [t,T] \times \Omega_i \rightarrow \mathbb{R}^{n \times m} \text{ , } i=1 , 2 , \cdots \mathcal{F}_s^{i,t}-\text{progressively measurable}$$and we assume the following integrability conditions $$\mathbb{E}_i \int_t^T |f_i(s)|ds \text{ , } \mathbb{E}_i \int_t^T |\Phi_i(s)|ds < \infty \text { , } i=1,2$$ With these assumptions we have two results:

  • $$\text{If } \mathcal{L_{\mathbb{P}_1}}(f_1(\cdot) , \zeta_1) = \mathcal{L_{\mathbb{P}_2}}(f_2(\cdot) , \zeta_2)$$ $$\text{Then: } \mathcal{L_{\mathbb{P}_1}}(\int_t^Tf_1(\cdot) , \zeta_1) = \mathcal{L_{\mathbb{P}_2}}(\int_t^Tf_2(\cdot) , \zeta_2) \text{ on }D=[t,T]$$
  • $$\text{If } \mathcal{L_{\mathbb{P}_1}}(\Phi_1(\cdot), W_1(\cdot) ,\zeta_1) = \mathcal{L_{\mathbb{P}_2}}( \Phi_2(\cdot), W_2(\cdot) , \zeta_2) $$ $$\text{Then: } \mathcal{L_{\mathbb{P}_1}}(\int_t^T\Phi_1(\cdot), dW_1(\cdot) ,\zeta_1) = \mathcal{L_{\mathbb{P}_2}}( \int_t^T\Phi_2(\cdot), dW_2(\cdot) , \zeta_2) \text{ on } D = (t,T)$$

These are resuts by Dr. Ondreját which we use to prove joint uniqueness in law of solutions to stochastic differential equations.

Theorem: Let \(\mu_1 , \mu_2\) be two GRPS , \(a_i(\cdot) \in U_t^{\mu_i} \text{ , } \eta_i \in L^2(\Omega_i , \mathcal{F}_s^{i,t}, \mathbb{P}_i)\). Let \(X_i(\cdot)\) be the unique solution of the state equation wth control \(a_i(\cdot)\) and initial condition \(X_i(t) = \eta_i\). If \(\mathcal{L}_{\mathbb{P}_1}(a_1(\cdot) , W_1(\cdot) , \eta_1) = \mathcal{L}_{\mathbb{P}_2}(a_2(\cdot) , W_2(\cdot) , \eta_2)\) on \([t,T]\) then \(\mathcal{L}_{\mathbb{P}_1}(X_1(\cdot) , a_1(\cdot)) = \mathcal{L}_{\mathbb{P}_2}(X_1(\cdot) , a_2(\cdot))\) on \([t,T]\). This is a powerful idea which says that if the laws across probability spaces of our inputs, the controls and the Wiener processes and initial conditions are the same then the joint laws of the solutions and the controls are the same.

Proof Idea: \(X_i(\cdot)\) is obtained by taking limits of iterations of maps. $$K_i[Z](s) = \eta_i + \int_t^sb(r,Z(r) , a_i(r))dr + \int_t^s\sigma(r,Z(r) , a_i(r))dW_i(r)$$ So we can take \(Z_1^i(s) = \eta_i \text{ , } Z^i_{k+1} = K_1[Z_k^i](s)\) which will converge to our desired result on a small time interval. Recalling our previous results, we have: $$\mathcal{L_{\mathbb{P}_1}}(Z^1_k(\cdot) , W_1(\cdot) , a_1(\cdot)) = \mathcal{L_{\mathbb{P}_1}}(Z^2_k(\cdot) , W_2(\cdot) , a_2(\cdot))$$ and the limit of \(k \rightarrow \infty\) gives our result.

We recall \(\mu = (\Omega , \mathcal{F} , \mathcal{F_s^t} , \mathbb{P} , W)\) is a reference probability space and we assume that that \(W\) has everywhere continuous trajectories and present the following lemma: \(a(\cdot)\in U_t^{\mu}\) then there exists \(\mathcal{F_s^{t,0}}\)-predictable process \( \widetilde{ a}(\cdot)\)such that \( \widetilde{a}(\cdot) = a(\cdot) \text{ } dt \otimes \mathbb{P}\) which means \(\mathbb{P}\)almost everywhere with respect to the product measure \(dt \times \Omega\).

Definition: \(a(\cdot)\) is \(\mathcal{F}_s^t\)-predictable if it is measurable with respect to the \(\sigma\)-field of \(\mathcal{F}_s^t\)predictable sets predictable \(\sigma\)-field generated by sets \((s,r]\times A \text{ } t \le s <r \text{ , } A \in \mathbb{F}_s^t\) and \(\{t\}\times A \text{ , } A \in \mathcal{F}_t^t\)

We see that the solution \(X( \cdot \text{ }; t , x , a(\cdot))\) is indistinguishable from the solution \(X(\cdot \text{ } ; t , x , \widetilde{a}(\cdot))\). Hence there exists \(\Omega_1 \subset \Omega\) where \(\mathbb{P}(\Omega_1) = 1\) such that \(X(\cdot \text{ } ; t,x,a(\cdot))(\omega) = X(\cdot \text{ } ; t,x,\widetilde{a}(\cdot))(\omega)\) on \([t , T] \text{ } \forall \omega \in \Omega\). Without lost of generality we can assume that all controls in \(U_t^{\mu}\) are \(F_s^{t,0}\)-predictable.

Definition: Cannonical Reference Probability Space \(\mu_{\mathbb{W}}(\mathbb{W} , \mathcal{B}(\mathbb{W}) , \mathbb{P}_*))\) where \(\mathbb{W} = \{\omega \in (C(t,T) , \mathbb{R}^n) ; \omega(t) = 0\}\) , \(\mathcal{B}(\mathbb{W})\) the Borel \(\sigma\)-field and \(\mathbb{P}\)-unique probability measure or Wiener measure such that \(W(s)(\omega) = \omega(s)\) is a standard Wiener process in \(\mathbb{R}^n\). We denote the sigma algebra generated by the Wiener process \(\mathcal{B}_s^{t,0} = \sigma(W(\tau) : t \le \tau \le s)\) and \(\mathcal{B} = \sigma(\mathcal{B}_s^{t,0} , \mathcal{N}_*)\) is it's augmented filtration where \(\mathcal{N}_*\) are the \(\mathbb{P}_*\) null sets. We denote the sigma field of predictable sets \(\mathbb{P}^{\mathbb{W}}_{(t,T)} = \mathcal{B}_s^{t,0}-\)predictable sets.

We now present the following idea which will allow us to represent any \(\mathcal{F}_s^{t,0}-\) predictable control given a RPS as a function of the paths of the brownian motion, which allows us to go from one RPS to the other through the path of the wiener process. If \(\mu\)-RPS with \(a(\cdot) \in U_t^{\mu} \text{ and }\mathcal{F}_s^{t,0}\)-predictable. Then there exists a map \(\mathcal{P}^{\mathbb{W}_{(t,T)}} \Big/ B(\Gamma)-\)measurable function \(f : [t,T] \times \mathbb{W} \rightarrow \Gamma\) such that \(a(s,\omega) = f(s , W( \cdot , \omega))\) with \(W\) the Wiener process. This implies that for every control in a reference probability space we can represent the control as a function of the Wiener process paths. Now suppose \(\mu_1 = (\Omega_1 , \mathcal{F}_1 , \mathcal{F}_s^{1,t} , \mathbb{P}_1 , W_1)\) is some other RPS then \(a_1(s,\omega) = f(s,W_1( \cdot , \omega))\) is \(\mathcal{F_s^{1,t,0}}\)-predictable and we have the joint law \(\mathcal{L}_{\mathbb{P}}(a(\cdot) , W(\cdot)) = \mathcal{L_{\mathbb{P}_1}}(a_1(\cdot) , W_1(\cdot))\). From here we can formulate a corrolary. For every RPS \(\mu\) we have \(V^{\mu}(t,x) = inf_{a(\cdot) \in U_t^{\mu}}J(t,x,a(\cdot)) = V(t,x)\) which we prove by noting that \(V^{\mu_1}(t,x) = V^{\mu_2}(t,x) \text{ } \forall \text{ RPS }\mu_1 , \mu_2\).

Standard Reference Probability Spaces.

Definition: A measure space \((\Omega^{'} , \mathcal{F}^{'})\) is standard if it is Borel Isomorphic to either of:

  • \((<1 , \dots , n> , B(\{1 , \dots , n\}))\)
  • \((\mathbb{N} , B(\mathbb{N}))\)
  • \(((0,1)^{\mathbb{N}},B((0,1)^{\mathbb{N}}))\)

Take for example \(S\) a Polish space, complete and separable metric space, then \((S , \mathcal{B}(S))\) is standard. The canonical RPS \(\mu_{\mathbb{W}}\) we have previously worked with is also standard as we will see from the following definition.

Definition: A RPS \(\mu\) is standard if there exists a \(\sigma\)-field \(\mathcal{F}^{'}\) such that \(F_T^{t,0} \subset \mathcal{F}^{'} \subset \mathcal{F}\). \(\mathcal{F}\) is the completion of \(\mathcal{F}^{'}\) and \((\Omega , \mathcal{F}^{'})\) is standard according the the previous definition as it is a measurable space.

We have the following fact: If \(\mu\) standard reference probability space then \(\forall \sigma\)-field \(\mathcal{G} \subset \mathcal{F}^{'}\) there exists a regular conditional probability \(p : \Omega^{'} \times \mathcal{F}^{'} \rightarrow [0,1]\) given \(\mathcal{G}\). Hence we have \(p(\omega , \cdot)\) is a probability measure on \(\mathcal{F}^{'}\) and \(p(\cdot , A)\) is measurable. $$\mathbb{P}(A | \mathcal{F_s^{t,0}})(\omega) = \mathbb{E}[\mathbb{I}(A)|\mathcal{F}_s^{t,0}](\omega) \text{ }\mathbb{P}-\text{almost every }\omega$$ If \(\mu\) is a standard RPS then there exists a regular conditional probability given \(\mathcal{F_s^{t,0}}\) which we denote by \(\mathbb{P}_{\omega_0} = \mathbb{P}(\cdot | \mathcal{F}_s^{t,0})(\omega_0)\).

Note that for every \(\mathcal{F}_s^{t,0} \big{/} B(\mathbb{R}^n)-\) measurable random variable \(Y\) for \(\mathbb{P}\) almost surely \(\omega_0\), \(\mathbb{P}_{\omega_0}(Y(\omega) = Y(\omega_0)) = 1\) and so \(Y\) is a random variable with respect to \(p\) and is deterministic relative to the conditionning.

Conditional Reference Probability Space

Suppose that \(0 \le t \le \xi \le T\) and \(\mu = (\Omega , \mathcal{F} , \mathcal{F}_s^t , \mathbb{P} , W)-\)standard RPS and \(W\) has everywhere continuous trajectories. We set \(W_{\xi}(s) = W(s) - W(\xi)\) to be a shift of the Brownian motion. We now present the following lemma: for \(\mathbb{P}-\)almost surely \(\omega_0\), \(W_{\xi}\) is a Wiener process in \( (\Omega, \mathcal{F}_{\omega_0} , {\mathcal{F}_{\omega_0}}_s^t , \mathbb{P}_{\omega_0})\) where \(\mathcal{F}_{\omega_0}\) is the augmentation of \(\mathcal{F}^{'}\) by \(\mathbb{P}_{\omega_0}-\)null sets, \({\mathcal{F}_{\omega_0}}_s^{\xi}\) is the filtration generated by \(W_{\xi}\) where \(\mathcal{F}^{'}\) is a sigma algebra assumed to be standard . This is proved using the previous idea of a process becomming deterministic under the conditionned measure. Therefore for \(\mathbb{P}\) almost everywhere \(\omega_0\), \(\mu^{\omega_0} = (\Omega , \mathcal{F}_{\omega_0} , {\mathcal{F}_{\omega_0}}_s^{\xi} , \mathbb{P}_{\omega_0} , W_{\xi})\) is a RPS on \([\xi , T]\). So we have started with our RPS and then with the conditionned measures we can create new RPS's.

If i take an admissible control with respect to this reference probability where we let \(\mu, \mu^{\omega_0}\) be as above and let \(a(\cdot) \in U_t^{\mu}\) be \(\mathcal{F}_s^{t,0}-\)predictable then \(a\big{|}_{[\xi,T]}(\cdot) \in U_{\xi}^{\mu^{\omega_0}}\) for \(\mathbb{P}\) almost every \(\omega_0\). We prove this noting that \(\mu^{\omega_0}\) is a reference probability space for \(\mathbb{P}\) almost every \(\omega_0\), hence we just need to show that \(a\big{|}_{[\xi,T]}(\cdot)\) is progressively measurable with respect to the filtration \( {\mathcal{F}_{\omega_0}}_s^{\xi} \), more explicitely, by showing that \(\mathcal{F_s^{t,0}} \subset {\mathcal{F}_{\omega_0}}_s^{\xi}\) for \(\xi \le s \le T\).

Proof of the Dynamic Programming Principle.

Denote \(\widetilde{U}_t = \cup_{\mu}U_t^{\mu}\) a union over all standard RPS. We will show the DPP with \(\widetilde{U}_t\) instead of \(U_t\) as the value functions are independent of the reference probability spaces and we know that this set is non empty given the cannonical space is in it. So from the uniqueness in law section that \(V^{\mu}\) is the same for all RPS \(\mu\) and we have joint uniqueness in law $$V(t,x) = inf_{a(\cdot) \in \widetilde{U}_t}\mathbb{E}\{\int_t^{t+h}L(s,X(s) , a(s))ds + V(t+h , X(t+h))\}$$ with \(t \le t+h \le T\). We note that if \(a(\cdot)\) is \(\mathcal{F}_s^{t,0}-\)predictable and \(X(\cdot , t , X , a(\cdot))\) is \(\mathcal{F}_s^t-\)progressively measurable. We can show that this \(X\) is indistinguishable from a process \(X_1(\cdot)\) which is \(\sigma(\mathcal{F}_s^{t,0} , \Omega_1)-\)progressively measurable with some \(\Omega_1\) such that \(\mathbb{P}(\Omega_1) = 1\). This implies that we can replace our solution which is \(\mathcal{F_s^t}-\)progressively measurable with one which is progressively measurable with a smaller filtration \(\mathcal{F}_s^{t,0}\). We however recall that \(\mathcal{F}_s^{t,0}\) is contained in \({\mathcal{F}_{\omega_0}}_s^{\xi}\) and \( \sigma(\mathcal{F}_s^{t,0} , \Omega_1) \) is a set of full measure for \(\mathbb{P}\) almost every \(\omega_0\) and it's complement will be a null set with respect to the conditionned measure \(\mathbb{P}_{\omega_0}\). \(X_1(\cdot)\) will be progressively measurable with respect to the \( {\mathcal{F}_{\omega_0}}_s^{\xi} \) filtration in the conditionned space.

Lemma: \(\forall R>0\) there exists a modulus \(\Upsilon_R\) such that: $$|J(t,x , a(\cdot)) - J(t,y , a(\cdot))| \le \Upsilon_R|x-y| \forall t \in [0,T]; |x|,|y| \le R$$ and $$|J(t,x,a(\cdot))| \le C(1+|x|^n) \forall (t,x) \in [0,T]\times \mathbb{R}^n$$\(\Upsilon_R\) is independent of the control as with the value function.

We write: $$V(t,x) = inf_{a(\cdot) \in \widetilde{U}_t}\mathbb{E}\{\int_t^{\eta}L(s,X(s) , a(s))ds + V(\eta , X(\eta))\} \text{ ; } t+h = \eta$$

Take \(\mu-\)standard RPS and \(a(\cdot) \in U_t^{\mu}\) is \(\mathcal{F}_s^{t,0}-\)predictable. We can assume that \(X(\cdot ; t,x,a(\cdot))\) is \(\sigma(\mathbb{F_s^{t,0}} , \Omega_1)-\) progressively measurable for some \(\mathbb{P}(\omega_1) = 1\). For \(\mathbb{P}\) almost every \(\omega_0\), \(\mu^{\omega_0} = (\Omega , \mathcal{F}_{\omega_0} , {\mathcal{F}_{\omega_0}}_s^{\xi} , \mathbb{P}_{\omega_0} , W_{\xi})\). Therefore \(a \big{|}_{[\xi , T]}(\cdot)\) and \(X\big{|}_{[\xi , T]}\) are \({\mathcal{F}_{\omega_0}}_s^{\xi}-\)progressively measurable for \(\mathbb{P}\) almost every \(\omega_0\). \(X\) is a solution to $$X(s) = x + \int_t^s b(s,X(s) , a(s))ds + \int_t^s \sigma(s , X(s) , a(s))dW_s$$ $$= X(\xi) + \int_{\xi}^{s}b(\cdot)ds + \int_{\xi}^{s} \sigma(\cdot) {dW_{\xi}}_s$$ is satisfied \(\mathbb{P}-\)almost everywhere.

To get that \(X(\cdot)\) is also a solution of the SDE in \(\mu^{\omega_0}\) for \(\mathbb{P}\) almost every \(\omega_0\) we need to show that the stochastic integrals are \(\mathbb{P}_{\omega_0}\) almost surely the same for \(\mathbb{P}\) almost surely \(\omega_0\). Then you can claim that the same process X that solved the SDE in the original space is also a solution to the SDE on the conditioned space for \(\mathbb{P}\) almost every \(\omega_0\). Having done that we can claim that we have everything that our restriction of the control is admissible in each conditioned space and our original \(X(\cdot)\) which was modified in an indistinguishable way is still a solution to the SDE in the conditioned space.

Recall that for \(\mathbb{P}\) almost every \(\omega_0\). $$\mathbb{P}_{\omega_0}(\omega | X(\xi , \omega) = X(\xi , \omega_0)) = 1$$ Now $$J(t,X,a(\cdot)) = \mathbb{E}[\int_t^{\xi}L( s,X(s) , a(s) )] + \mathbb{E}[\int_{\xi}^TL(s,X(s) , a(s))ds + g(X(T))]$$ $$=\mathbb{E}[\int_t^{\xi} L(s,X(s) , a(s))ds ] + \mathbb{E}\big{[}\mathbb{E}[\int_{\xi}^TL(s,X(s) , a(s)) + g(X(T))| \mathcal{F_s^{t,0}}]\big{]}$$ $$= \mathbb{E}[\int_t^{\xi} L(s,X(s) , a(s))ds ] + \mathbb{E}\big{[}\mathbb{E}_{\omega_0}[\int_{\xi}^T L(s , X(s) , a(s))ds + g(X(T))]\big{]}$$ $$= \mathbb{E}[\int_t^{\xi} L(s , X(s) , a(s)) ds] + \mathbb{E}[J^{\mu^{\omega_0}}(\xi , X(\xi , \omega_0) ; a(\cdot))]$$ $$\ge \mathbb{E}V(\xi , X(\xi)) + \mathbb{E}[\int_t^{\xi}L(s , X(s) , a(s))ds]$$ $$ \Rightarrow V(t,x) \ge \mathbb{E}[\int_t^{s}L(s , X(s) , a(s))ds] + V(\xi , X(\xi)) \text{ ; } \forall a(\cdot) \in U_t^{\mu} \text{ , } \mu-standard$$ By taking the infimum on the right hand side we have our result for the larger than or equal case.

Now we work on the less than or equal case. Take any \(a(\cdot) \in U_t^{\mu}\) and \(\mu = (\Omega , \mathcal{F} , \mathcal{F}_s^t , \mathcal{P} , W)\) a standard RPS, \(\epsilon > 0\) $$J(t , X , a(\cdot)) = \mathbb{E}[\int_t^{\xi} L( s,X(s) , a(s) ) ds] + \mathbb{E}[\int_{\xi}^t L( s,X(s) , a(s) ) ds + g(X(T))]$$ Our goal is to modify the control \(a(\cdot)\) on \([\xi , T]\) so that the second expectation is less than or equal and close to the value function up to some \(\epsilon\).

We take all of \(\mathbb{R}^n\) and using the continuity of \(J\) and \(V\) we can divide \(\mathbb{R}^n\) into Borel sets such that when we have two elements from each set the differences in both the cost and value functionals are small. So using continuity of \(J(\cdot)\) and \(V(\cdot)\) in \(x\) we can find a partition \(D_j\) of \(\mathbb{R}^n\) into disjoint Borel sets, \(D_j\) for \(j = 1 , 2 , \dots\)such that if \(x,y \in D_j\) and \(\widetilde{a}(\cdot) \in U_t\) any admissible control then $$|J(\xi , x , a(\cdot)) - J(\xi , y , a(\cdot))| + |V(\xi , X) - V(\xi , y)| < \epsilon$$ this exists thanks to uniform continuity on bounded subsets in \(a(\cdot)\) an continuity for a fixed \(\xi\). For each \(j\), we choose \(x_j \in D_j\) and \(a_j(\cdot) \in U_t^{\mu_j}\) for some \(\mu_j = (\Omega_1 , \mathcal{F}_j , {\mathcal{F}_j}_s^{\xi} , \mathbb{P}_j , W_j)\) such that $$J(\xi , x_j , a_j(\cdot)) < V(\xi , x_j) + \epsilon$$ We will notice that we cannot control \(a\), so we will need to create one single control on the original probability space which will unify the different Borel sets defined above. So for some time \xi, the original x for some \(omega\) is in \(D_j\) we will need to use the control \(a_j\) which are all on different reference probability spaces. We will then unify our control by representing each \(a_j\) as a functional of the path of the brownian motion \(W_j\) but then we will just replace \(W_j\) with our shifted \(W\) which is \(W_{\xi}\). This will allow us to have a control which has the same law are the control \(a_j\) on our original probability space.

WLOG we can assume that \(a_j(\cdot)\) are \({\mathcal{F}_j}_s^{\xi , 0}-\)predictable. We let \(f_j: [\xi , T] \times C([\xi , T] , \mathbb{R}^n) \rightarrow \Gamma\) be functions such that \(a_j(s,\omega) = f_j(s , W_j(\cdot , \omega))\) then the processes \(\widetilde{a}_j(s,\omega) = f_j(s , W_{\xi}( \cdot , \omega))\) are \(\mathcal{F}_s^{t,0}-\)progressively measurable and for \(\mathbb{P}\) almost every \(\omega_0\) are \({\mathcal{F}_{\omega}}_s^{\xi}-\)progressively measurable in the RPS \(\mu^{\omega_0} = (\Omega , \mathcal{F}_{\omega_0} , \mathcal{F}_s^{\omega_0 , \xi} , \mathbb{P}_{\omega_0} , W_{\xi})\). Moreover we have the following equality in law $$\mathcal{L}_{\mathbb{P}_{\omega_0}}(\widetilde{a}_j(\cdot , W_{\xi})) = \mathcal{L}_{\mathbb{P}_j}(a_j(\cdot) , W_j(\cdot))$$ We define the control $$a^{\xi}(s,\omega) = a(s,\omega)\mathbb{I}(t \le s \le \xi) + \mathbb{I}(s \ge \xi)\sum_{j\in \mathbb{N}}\widetilde{a}_j(s, \omega)\mathbb{I}(X(\xi , t , x , a(\cdot))\in D_j)$$ We recall that that if our process is in \(D_j\) we will use the \(D_j\) specific control and we recall that with respect to the conditionned measures the process at time \(\xi\) is deterministic. Denote \(O_j = \{\omega : X(\xi , t , x , a(\cdot)) \in D_j\}\) which is progressively measurable with respect to the original filtration. We denote \(X(s) = X(s , t , x , a^{\xi}(\cdot))\). Then \(X(s) = X(s , t , x , a(\cdot))\) on \([t , \xi]\), now since for \(\mathbb{P}\) almost every \(\omega_0\) \(X(\xi , 0) = X(\xi , \omega_0) \mathbb{P}_{\omega_0}\) almost surely. If \(\omega_0 \in O_j\) then \(a^{\xi}(\cdot) = \widetilde{a}_j(\cdot)\) on \([\xi , T] \mathbb{P}_{\omega_0}\) almost surely and thus for \(\mathbb{P}\) almost surely \(\omega_0\) \(a^{\xi}\big{|}_{[\xi , T]}(\cdot) \in U_{\xi}^{\mu^{\omega_0}}\) and $$\mathcal{L}_{\mathbb{P}_{\omega_0}}(a^{\xi}(\cdot) , W_{\xi}(\cdot)) = \mathcal{L}_{\mathbb{P}_{j}}(a_j(\cdot) , W_{j}(\cdot))$$ Moreover arguing as for the first case of the proof, we can assume that \(X(\cdot)\) the solution of the SDE in the original probability space with the control \(a^{\xi}(\cdot)\) is such that \(X(\cdot) = X^{\mu^{\omega_0}}(\cdot ; \xi , X(\xi , a^{\xi}(\cdot)))\) on \([\xi , T] \mathbb{P_{\omega}}\) almost surely. Thus we obtain by uniqueness in law: $$\mathcal{L}_{\mathbb{P_{\omega_0}}}(X(\cdot) , a^{\xi}(\cdot)) = \mathcal{L}_{\mathbb{P}_1}(X^{]mu_j}(\cdot) , a_j(\cot))$$ where \(X^{\mu_j(s)} = X(s ; \xi , X(\xi , t , x , a(\cdot))(\omega_0) , a_j(\cdot))\) which are solutions to the same SDE with the same initial conditions with the control \(a_j(\cdot)\) in the space \(\mu_j\) and since the joint law of the control and the brownian motion \(W_j(\cdot)\) are the same, then by joint uniqueness the joint law of the couple solutions and controls are the same.

Therefore if we take $$\mathbb{E}[\int_{\xi}^T L(s , X(s) , a^{\xi}(s))ds + g(X(T))]$$ $$= \mathbb{E}\big{[}\mathbb{E}[ \int_{\xi}^T L(s , X(s) , a^{\xi}(s))ds + g(X(T)) \big{|} \mathcal{F}_s^{t,0}] \big{]}$$ $$= \sum _{j=1}^{\infty} \int_{O_j}\mathbb{E}_{\omega_0}[\int_{\xi}^T L(s , X(s) , a^{\xi}(s))ds + g(X(T)) ]d\mathbb{P}_{\omega_0}$$ $$= \sum_{j=1}^{\infty}\int_{O_j} J_{\mathbb{P}_{\omega_0}}(\xi , X(\xi , t , x , a(\cdot))(\omega_0) , a^{\xi}(\cdot))d\mathbb{P}_{\omega_0}$$ and using the joint uniqueness in law $$= \sum_{j = 1}^{\infty} \int_{O_j} J_{\mathbb{P}_j}(\xi , X(\xi ; t , x , a(\cdot))(\omega_0) , a_j(\cdot))d \mathbb{P}_{\omega_0}$$ We recall that for some partition \(D_j\) of \(\mathbb{R}^n\) if \(x , y \in D_j\) then we have the following continuity estimate for every reference probability space \(|J(\xi , x , a(\cdot))- J(\xi , y , a(\cdot))| < \epsilon\) and \(|V(\xi , x) - V(\xi , y)| < \epsilon\) and \(J_{\mathbb{P}_j}(\xi , x , a_j(\cdot)) < V(\xi , x) + \epsilon\). Then if \(\omega_0\) is such that \( X(\xi ; t , x , a(\cdot))(\omega_0) \in O_j\) then: $$ J_{\mathbb{P}_j}(\xi , X(\xi ; t , x , a(\cdot))(\omega_0) , a_j(\cdot)) < J_{\mathbb{P}_j}(\xi , x , a_j(\cdot)) + \epsilon$$ $$< V(\xi , x_j) + 2\epsilon < V(\xi , X(\xi , t , x , a(\cdot))(\omega_0)) + 3 \epsilon$$ Hence we obtain $$\mathbb{E}[\int_{\xi}^T L(s , X(s) , a(s))ds + g(X(T))] \le \mathbb{E}[V(\xi , X(\xi))]+3\epsilon$$ Hence we obtain that $$V(t,x) \le J(t, x , s^{\xi}(\cdot)) \ le \mathbb{E}[\int_{\xi}^T L(s,X(s) , a(s))ds + V(\xi , X(\xi))] + 3\epsilon$$ and here \(a\) is the original \(a\) in \(U_t^{\mu}\). Hence we now take the infimum over the space of \(a(\cdot) \in \widetilde{U}_t\) with \(\mu\) is standard. This finnaly proves the Dynamic Progamming Principle.

Improving on the DPP with Stopping Times

Corollary: \(V(\cdot)\) is locally uniformly continuous in x and continuous in time. We can show there exists for every \(R\) a modulus \(\rho_R\) such that \(|V(t,x) - v(s,x)| \le \rho_R |t-s\). We prove this statement by noting that for \(s > t\), $$|V(s,x)-V(t,x)| \le sup_{a(\cdot)\in U_t}\mathbb{E}\big{|}\int_t^s L(r , X(r) , a(r))ds + V(s,X(s) - V(s,x)) \big{|}$$ $$sup_{a(\cdot)\in U_t}\mathbb{R}\int_t^s C(1 + |X(r)|^N)dr + sup_{a(\cdot) \in U_t}|V(s, X(s)) - V(s , x)|$$ $$\le C(1+|x|^N)(s-t)...$$ recalling that \(\mathbb{E} sup_{t \le r \le s}|X(r)-X|^2 \le C(s-t)\) and using estimates for moments of \(|X|^N\) and modulus of continuity argument for \(V(s , \cdot)\)

Dynamic Progamming Principle with stopping times. Define \(\mathcal{V}_t =\) the set of all pairs \((a(\cdot) , \tau_{a(\cdot)})\), where \(a(\cdot) \in U_t^{\mu}\) for some RPS \(\mu\) and \(\tau_{a(\cdot)}\) is a stopping time in \(\mu\) such that \(\tau : (\Omega , \mathcal{F}) \rightarrow [t,\infty]\), \(\tau\) is a \(\mathcal{F}_s^t-\)stopping time if for all \(s \le t : \{\tau \le s\} = \{\omega \in \Omega : \tau(\omega) \le s\} \in \mathcal{F}_s^t\). As an example of stopping time take for instance \(A\) some subset of \(R^{n}\), then the exit time from \(A\) is a stopping time. $$V(t,x) = inf_{a(\cdot) \in \tau_{a(\cdot)}\in \mathcal{V}_t}\mathbb{E}[\int_t^{\tau_{a(\cdot)}} L(s , X(s), a(s))ds + V(\tau_{a(\cdot)} , X(\tau_{a(\cdot)}))]$$

Theorem: The value function \(V\) is a unique viscosity solution of the HJB equation within the class of at most polynomially growing functions under our standard assumptions. 

No comments:

Post a Comment

Sufficiency, Completeness and Unbiasedness (UMVUE)

When you create an estimator for a parameter, one aspect of interest is its precision. That is, you want your estimator to as many times as ...