Previous |  Up |  Next

Article

Keywords:
Markov games; empirical estimation; discounted and average criteria
Summary:
This work deals with a class of discrete-time zero-sum Markov games whose state process $\left\{ x_{t}\right\} $ evolves according to the equation $ x_{t+1}=F(x_{t},a_{t},b_{t},\xi _{t}),$ where $a_{t}$ and $b_{t}$ represent the actions of player 1 and 2, respectively, and $\left\{ \xi _{t}\right\} $ is a sequence of independent and identically distributed random variables with unknown distribution $\theta$. Assuming possibly unbounded payoff, and using the empirical distribution to estimate $\theta$, we introduce approximation schemes for the value of the game as well as for optimal strategies considering both, discounted and average criteria.
References:
[1] Chang, H. S.: Perfect information two-person zero-sum Markov games with imprecise transition probabilities. Math. Meth. Oper. Res. 64 (2006), 235-351. DOI 10.1007/s00186-006-0081-5 | MR 2264789
[2] Dudley, R. M.: The speed of mean Glivenko-Cantelli convergence. Ann. Math. Stat. 40 (1969), 40-50. DOI 10.1214/aoms/1177697802 | MR 0236977
[3] Dynkin, E. B., Yushkevich, A. A.: Controlled Markov Processes. Springer-Verlag, New York 1979. DOI 10.1007/978-1-4615-6746-2 | MR 0554083
[4] Fernández-Gaucherand, E.: A note on the Ross-Taylor Theorem. Appl. Math. Comp. 64 (1994), 207-212. DOI 10.1016/0096-3003(94)90064-7 | MR 1298262
[5] Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer-Verlag, New York 1997. DOI 10.1007/978-1-4612-4054-9 | MR 1418636
[6] Ghosh, M. K., McDonald, D., Sinha, S.: Zero-sum stochastic games with partial information. J. Optim. Theory Appl. 121 (2004), 99-118. DOI 10.1023/b:jota.0000026133.56615.cf | MR 2062972
[7] Gordienko, E. I.: Adaptive strategies for certain classes of controlled Markov processes. Theory Probab. Appl. 29 (1985), 504-518. DOI 10.1137/1129064 | MR 0761133
[8] Gordienko, E. I., Hernández-Lerma, O.: Average cost Markov control processes with weighted norms: existence of canonical policies. Appl. Math. 23 (1995), 199-218. MR 1341223 | Zbl 0829.93067
[9] Gordienko, E. I., Hernández-Lerma, O.: Average cost Markov control processes with weighted norms: value iteration. Appl. Math. 23 (1995), 219-237. MR 1341224
[10] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996. DOI 10.1007/978-1-4612-0729-0 | MR 1363487 | Zbl 0840.93001
[11] Hilgert, N., Minjárez-Sosa, J. A.: Adaptive control of stochastic systems with unknown disturbance distribution: discounted criterion. Math. Meth. Oper. Res. 63 (2006), 443-460. DOI 10.1007/s00186-005-0024-6 | MR 2264761
[12] Jaśkiewicz, A., Nowak, A.: Zero-sum ergodic stochastic games with Feller transition probabilities. SIAM J. Control Optim. 45 (2006), 773-789. DOI 10.1137/s0363012904443257 | MR 2247715
[13] Jaśkiewicz, A., Nowak, A.: Approximation of noncooperative semi-Markov games. J. Optim. Theory Appl. 131 (2006), 115-134. DOI 10.1007/s10957-006-9128-2 | MR 2278300
[14] Krausz, A., Rieder, U.: Markov games with incomplete information. Math. Meth. Oper. Res. 46 (1997), 263-279. DOI 10.1007/bf01217695 | MR 1481935
[15] Minjárez-Sosa, J. A.: Nonparametric adaptive control for discrete-time Markov processes with unbounded costs under average criterion. Appl. Math. (Warsaw) 26 (1999), 267-280. DOI 10.4064/am-26-3-267-280 | MR 1725752
[16] Minjárez-Sosa, J. A., Vega-Amaya, O.: Asymptotically optimal strategies for adaptive zero-sum discounted Markov games. SIAM J. Control Optim. 48 (2009), 1405-1421. DOI 10.1137/060651458 | MR 2496982
[17] Minjárez-Sosa, J. A., Vega-Amaya, O.: Optimal strategies for adaptive zero-sum average Markov games. J. Math. Analysis Appl. 402 (2013), 44-56. DOI 10.1016/j.jmaa.2012.12.011 | MR 3023236
[18] Minjárez-Sosa, J. A., Luque-Vásquez, F.: Two person zero-sum semi-Markov games with unknown holding times distribution on one side: discounted payoff criterion. Appl. Math. Optim. 57 (2008), 289-305. DOI 10.1007/s00245-007-9016-7 | MR 2407314
[19] Neyman, A., Sorin, S.: Stochastic Games and Applications. Kluwer, 2003. DOI 10.1007/978-94-010-0189-2 | MR 2035554
[20] Prieto-Rumeau, T., Lorenzo, J. M.: Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion. TOP 23 (2015), 799-836. DOI 10.1007/s11750-014-0354-8 | MR 3407676
[21] Shimkin, N., Shwartz, A.: Asymptotically efficient adaptive strategies in repeated games. Part I: Certainty equivalence strategies. Math. Oper. Res. 20 (1995), 743-767. DOI 10.1287/moor.20.3.743 | MR 1354780
[22] Shimkin, N., Shwartz, A.: Asymptotically efficient adaptive strategies in repeated games. Part II: Asymptotic optimality. Math. Oper. Res. 21 (1996), 487-512. DOI 10.1287/moor.21.2.487 | MR 1397226
[23] Schäl, M.: Conditions for optimality and for the limit of $n$-stage optimal policies to be optimal. Z. Wahrs. Verw. Gerb. 32 (1975), 179-196. DOI 10.1007/bf00532612 | MR 0378841
[24] Rao, R. Ranga: Relations between weak and uniform convergence of measures with applications. Ann. Math. Statist. 33 (1962), 659-680. DOI 10.1214/aoms/1177704588 | MR 0137809
[25] Nunen, J. A. E. E. Van, Wessels, J.: A note on dynamic programming with unbounded rewards. Manag. Sci. 24 (1978), 576-580. DOI 10.1287/mnsc.24.5.576 | MR 0521666
Partner of
EuDML logo