Previous |  Up |  Next

Article

Keywords:
piecewise deterministic Markov decision processes; risk probability criterion; optimal policy; the value iteration algorithm
Summary:
The purpose of this paper is to study the risk probability problem for infinite horizon piecewise deterministic Markov decision processes (PDMDPs) with varying discount factors and unbounded transition rates. Different from the usual expected total rewards, we aim to minimize the risk probability that the total rewards do not exceed a given target value. Under the condition of the controlled state process being non-explosive is slightly weaker than the corresponding ones in the previous literature, we prove the existence and uniqueness of a solution to the optimality equation, and the existence of the risk probability optimal policy by using the value iteration algorithm. Finally, we provide two examples to illustrate our results, one of which explains and verifies our conditions and the other shows the computational results of the value function and the risk probability optimal policy.
References:
[1] Almudevar, A.: A dynamic programming algorithm for the optimal control of piecde-wise deterministic Markov processes. SIAM J. Control Optim. 40 (2001), 525-539. DOI  | MR 1857362
[2] Bertsekas, D., Shreve, S.: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1978. MR 0511544
[3] Costa, O. L. V., Dufour, F.: The vanishing discount approach for the average continuous of piecewise deterministic Markov processes. J. Appl. Probab. 46 (2009), 1157-1183. DOI  | MR 2582713
[4] Costa, O. L. V., Dufour, F.: Continuous Average Control of Piecewise Deterministic Markov Processes. Springer-Vrelag, New York 2013. MR 3059228
[5] Bauerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Heidelberg 2011. MR 2808878
[6] Bertsekas, D., Shreve, S.: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1978. MR 0511544
[7] Boda, K., Filar, J. A., Lin, Y. L.: Stochastic target hitting time and the problem of early retirement. IEEE Trans. Automat. Control.49 (2004), 409-419. DOI  | MR 2062253
[8] Davis, M. H. A.: Piecewise deterministic Markov processes: a general class of nondiffusion stochastic models. J. Roy. Statist. Soc. 46 (1984), 353-388. DOI  | MR 0790622
[9] Davis, M. H. A.: Markov Models and Optimization. Chapman and Hall 1993. DOI  | MR 1283589
[10] Dufou, F., Horiguchi, M., Piunovskiy, A.: Optimal impulsive control of piecewise deterministic Markov processes. Stochastics 88 (2016), 1073-1098. DOI  | MR 3529861
[11] Guo, X. P., Hernández-Lerma, O.: Continuous-Time Markov Decision Process: Theorey and Applications. Springer-Verlag, Berlin 2009. MR 2554588
[12] Guo, X. P., Piunovskiy, A.: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36 (2011), 105-132. DOI  | MR 2799395
[13] Guo, X. P., Song, X. Y., Zhang, Y.: First passage optimality for continuous time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans. Automat. Control 59 (2014), 163-174. DOI  | MR 3163332
[14] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Process: Basic Optimality Criteria. Springer-Verlag, New York 1996. MR 1363487
[15] Hespanha, J. P.: A model for stochastic hybrid systems with applications to communication networks. Nonlinear Anal. 62 (2005), 1353-1383. DOI  | MR 2164929
[16] Huang, Y. H., Guo, X. P.: Finite-horizon piecewise deterministic Markov decision processes with unbounded transition rates. Stochastics 91 (2019), 67-95. DOI  | MR 3878427
[17] Huang, Y. H., Guo, X. P., Li, Z. F.: Minimum risk probability for finite horizon semi-Markov decision process. J. Math. Anal. Appl. 402 (2013), 378-391. DOI  | MR 3023265
[18] Huang, X. X., Zou, X. L., Guo, X. P.: A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci. China Math. 58 (2015), 1923-1938. DOI  | MR 3383991
[19] Huo, H. F., Wen, X.: First passage risk probability optimality for continuous time Markov decision processes. Kybernetika 55 (2019), 114-133. DOI  | MR 3935417
[20] Huo, H. F., Zou, X. L., Guo, X. P.: The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dynamic system: Theory Appl. 27 (2017), 675-699. DOI  | MR 3712415
[21] Janssen, J., Manca, R.: Semi-Markov Risk Models For Finance, Insurance, and Reliability. Springer-Verlag, New York 2006. MR 2301626
[22] Lin, Y. L., Tomkins, R. J., Wang, C. L.: Optimal models for the first arrival time distribution function in continuous time with a special case. Acta. Math. Appl. Sinica 10 (1994) 194-212. DOI  | MR 1289720
[23] Ohtsubo, Y., Toyonaga, K.: Optimal policy for minimizing risk models in Markov decision processes. J. Math. Anal. Appl. 271 (2002), 66-81. DOI  | MR 1923747
[24] Piunovskiy, A., Zhang, Y.: Continuous-Time Markov Decision Processes: Borel Space Models and General Control Strategies. Springer, 2020. MR 4180990
[25] Wen, X., Huo, H. F., Guo, X. P.: First passage risk probability minimization for piecewise deterministic Markov decision processes. Acta Math. Appl. Sinica 38 (2022), 549-567. DOI  | MR 4447198
[26] Wu, C. B., Lin, Y. L.: Minimizing risk models in Markov decision processes with policies depending on target values. J. Math. Anal. Appl. 231 (1999), 47-57. DOI  | MR 1676741
[27] Wu, X., Guo, X. P.: First passage optimality and variance minimization of Markov decision processes with varying discount factors. J. Appl. Prob. 52 (2015), 441-456. DOI  | MR 3372085
Partner of
EuDML logo