Previous |  Up |  Next

Article

Keywords:
risk-sensitive Markov decision chains; average optimal policies; optimal growth rates
Summary:
In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.
References:
[1] Bellman R.: On a quasi-linear equation. Canadian J. Math. 8 (1956), 198–202 MR 0077782 | Zbl 0071.34902
[2] Bellman R.: Dynamic Programming. Princenton Univ. Press, Princenton, N.J. 1957 MR 0090477 | Zbl 1205.90002
[3] Berman A., Plemmons R. J.: Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York 1979 MR 0544666 | Zbl 0815.15016
[4] Bielecki T. D., Hernández-Hernández, D., Pliska S. R.: Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management. Math. Methods Oper. Res. 50 (1999), 167–188 MR 1732397
[5] Cavazos-Cadena R.: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space. Math. Methods Oper. Res. 57 (2003), 253–285 MR 1973378 | Zbl 1023.90076
[6] Cavazos-Cadena R., Montes-de-Oca R.: The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math. Oper. Res. 28 (2003), 752–756 MR 2015911 | Zbl 1082.90125
[7] Cavazos-Cadena R., Montes-de-Oca R.: Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion. J. Appl. Probab. 42 (2005), 905–918 MR 2203811 | Zbl 1105.90101
[8] Cavazos-Cadena R., Hernández-Hernández D.: Solution fo the risk-sensitive average optimality equation in communicating Markov decision chains with finite state space: An alternative approach. Math. Methods Oper. Res. 56 (2002), 473–479 MR 1953028
[9] Cavazos-Cadena R., Hernández-Hernández D.: A characterization exponential functionals in finite Markov chains. Math. Methods Oper. Res. 60 (2004), 399–414 MR 2106091
[10] Cavazos-Cadena R., Hernández-Hernández D.: A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains. Ann. Appl. Probab. 15 (2005), 175–212 MR 2115041 | Zbl 1076.93045
[11] Gantmakher F. R.: The Theory of Matrices. Chelsea, London 1959
[12] Howard R. A., Matheson J.: Risk-sensitive Markov decision processes. Manag. Sci. 23 (1972), 356–369 MR 0292497 | Zbl 0238.90007
[13] Jaquette S. A.: A utility criterion for Markov decision processes. Manag. Sci. 23 (1976), 43–49 MR 0439037 | Zbl 0337.90053
[14] Mandl P.: Decomposable non-negative matrices in a dynamic programming problem. Czechoslovak Math. J. 20 (1970), 504–510 MR 0264978 | Zbl 0209.22902
[15] Rothblum U. G., Whittle P.: Growth optimality for branching Markov decision chains. Math. Oper. Res. 7 (1982), 582–601 MR 0686533 | Zbl 0498.90082
[16] Sladký K.: On dynamic programming recursions for multiplicative Markov decision chains. Math. Programming Study 6 (1976), 216–226 MR 0452725 | Zbl 0374.60089
[17] Sladký K.: Successive approximation methods for dynamic programming models. In: Proc. of the Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule, eds.). Academia, Prague 1979, pp. 171–189 Zbl 0496.90081
[18] Sladký K.: Bounds on discrete dynamic programming recursions I. Kybernetika 16 (1980), 526–547 MR 0607292 | Zbl 0454.90085
[19] Whittle P.: Optimization Over Time – Dynamic Programming and Stochastic Control. Volume II, Chapter 35, Wiley, Chichester 1983 MR 0710833
[20] Zijm W. H. M.: Nonnegative Matrices in Dynamic Programming. Mathematical Centre Tract, Amsterdam 1983 MR 0723868 | Zbl 0526.90059
Partner of
EuDML logo