Previous |  Up |  Next

Article

Title: Risk-sensitive average optimality in Markov decision processes (English)
Author: Sladký, Karel
Language: English
Journal: Kybernetika
ISSN: 0023-5954 (print)
ISSN: 1805-949X (online)
Volume: 54
Issue: 6
Year: 2018
Pages: 1218-1230
Summary lang: English
.
Category: math
.
Summary: In this note attention is focused on finding policies optimizing risk-sensitive optimality criteria in Markov decision chains. To this end we assume that the total reward generated by the Markov process is evaluated by an exponential utility function with a given risk-sensitive coefficient. The ratio of the first two moments depends on the value of the risk-sensitive coefficient; if the risk-sensitive coefficient is equal to zero we speak on risk-neutral models. Observe that the first moment of the generated reward corresponds to the expectation of the total reward and the second central moment of the reward variance. For communicating Markov processes and for some specific classes of unichain processes long run risk-sensitive average reward is independent of the starting state. In this note we present necessary and sufficient condition for existence of optimal policies independent of the starting state in unichain models and characterize the class of average risk-sensitive optimal policies. (English)
Keyword: controlled Markov processes
Keyword: finite state space
Keyword: asymptotic behavior
Keyword: risk-sensitive average optimality
MSC: 90C40
MSC: 93E20
idZBL: Zbl 07031770
idMR: MR3902630
DOI: 10.14736/kyb-2018-6-1218
.
Date available: 2019-02-18T14:51:30Z
Last updated: 2020-01-05
Stable URL: http://hdl.handle.net/10338.dmlcz/147606
.
Reference: [1] Arapostathis, A., Borkar, V. S., Fernandez-Gaucherand, F., Ghosh, M. K., Marcus, S. I.: Discrete-time controlled Markov processes with average cost criterion: A survey..SIAM J. Control Optim. 31 (1993), 282-344. MR 1205981, 10.1137/0331018
Reference: [2] Bather, J.: Optimal decisions procedures for finite Markov chains, Part II..Adv. Appl. Probab. 5 (1973), 328-339. MR 0368790, 10.2307/1426039
Reference: [3] Bielecki, T. D., Hernández-Hernández, D., Pliska, S. R.: Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management..Math. Methods Oper. Res. 50 (1999), 167-188. MR 1732397, 10.1007/s001860050094
Reference: [4] Cavazos-Cadena, R.: Value iteration and approximately optimal stationary policies in finite-state average Markov chains..Math. Methods Oper. Res. 56 (2002), 181-196. MR 1938210, 10.1007/s001860200205
Reference: [5] Cavazos-Cadena, R.: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space..Math. Methods Oper. Res. 57 (2003), 2, 263-285. MR 1973378, 10.1007/s001860200256
Reference: [6] Cavazos-Cadena, R.: Solution of the average cost optimality equation for finite Markov decision chains: risk-sensitive and risk-neutral criteria..Math. Methods Oper. Res. 70 (2009), 541-566. MR 2558431, 10.1007/s00186-008-0277-y
Reference: [7] Cavazos-Cadena, R., Fernandez-Gaucherand, F.: Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions..Math. Methods Oper. Res. 43 (1999), 121-139. MR 1687362
Reference: [8] Cavazos-Cadena, R., Hernández-Hernández, D.: A characterization exponential functionals in finite Markov chains..Math. Methods Oper. Res. 60 (2004), 399-414. MR 2106091, 10.1007/s001860400373
Reference: [9] Cavazos-Cadena, R., Hernández-Hernández, D.: A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains..Ann. Appl. Probab. 15 (2005), 175-212. MR 2115041, 10.1214/105051604000000585
Reference: [10] Cavazos-Cadena, R., Hernández-Hernández, D.: Necessary and sufficient conditions for a solution to the risk-sensitive Poisson equation on a finite state space..System Control Lett. 58 (2009), 254-258. MR 2510639, 10.1016/j.sysconle.2008.11.001
Reference: [11] Cavazos-Cadena, R., Montes-de-Oca, R.: The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space..Math. Oper. Res. 28 (2003), 752-756. MR 2015911, 10.1287/moor.28.4.752.20515
Reference: [12] Cavazos-Cadena, R., Montes-de-Oca, R.: Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion..J. Appl. Probab. 42 (2005), 905-918. MR 2203811, 10.1017/s0021900200000991
Reference: [13] Cavazos-Cadena, R., Feinberg, A., Montes-de-Oca, R.: A note on the existence of optimal policies in total reward dynamic programs with compact action sets..Math. Oper. Res. 25 (2000), 657-666. MR 1855371, 10.1287/moor.25.4.657.12112
Reference: [14] Gantmakher, F. R.: The Theory of Matrices..Chelsea, London 1959. MR 0107649
Reference: [15] Howard, R. A.: Dynamic Programming and Markov Processes..MIT Press, Cambridge, Mass. 1960. MR 0118514
Reference: [16] Howard, R. A., Matheson, J.: Risk-sensitive Markov decision processes..Manag. Sci. 23 (1972), 356-369. MR 0292497, 10.1287/mnsc.18.7.356
Reference: [17] Mandl, P.: On the variance in controlled Markov chains..Kybernetika 7 (1971), 1-12. Zbl 0215.25902, MR 0286178
Reference: [18] Mandl, P.: Estimation and control in Markov chains..Adv. Appl. Probab. 6 (1974), 40-60. MR 0339876, 10.2307/1426206
Reference: [19] Markowitz, H.: Portfolio selection..J. Finance 7 (1952), 77-92. MR 0103768, 10.1111/j.1540-6261.1952.tb01525.x
Reference: [20] Markowitz, H.: Portfolio Selection - Efficient Diversification of Investments..Wiley, New York 1959. MR 0103768
Reference: [21] Puterman, M. L.: Markov Decision Processes - Discrete Stochastic Dynamic Programming..Wiley, New York 1994. MR 1270015, 10.1002/9780470316887
Reference: [22] Ross, S. M.: Introduction to Stochastic Dynamic Programming..Academic Press, New York 1983. MR 0749232
Reference: [23] Sladký, K.: Necessary and sufficient optimality conditions for average reward of controlled Markov chains..Kybernetika 9 (1973), 124-137. MR 0363495
Reference: [24] Sladký, K.: On the set of optimal controls for Markov chains with rewards..Kybernetika 10 (1974), 526-547. MR 0378842
Reference: [25] Sladký, K.: Growth rates and average optimality in risk-sensitive Markov decision chains..Kybernetika 44 (2008), 205-226. MR 2428220
Reference: [26] Sladký, K.: Risk-sensitive and average optimality in Markov decision processes..In: Proc. 30th Int. Conf. Math. Meth. Economics 2012, Part II (J.Ramík and D.Stavárek, eds.), Silesian University, School of Business Administration, Karviná 2012, pp. 799-804. 10.1007/3-540-32539-5_125
Reference: [27] Sladký, K.: Risk-sensitive and mean variance optimality in Markov decision processes..Acta Oeconomica Pragensia 7 (2013), 146-161.
Reference: [28] Dijk, N. M. van, Sladký, K.: On the total reward variance for continuous-time Markov reward chains..J. Appl. Probab. 43 (2006), 1044-1052. MR 2274635, 10.1017/s0021900200002412
.

Files

Files Size Format View
Kybernetika_54-2018-6_9.pdf 518.0Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo