[1] Abounadi, J., Bertsekas, D., Borkar, V. S.:
Learning algorithms for Markov decision processes with average cost. SIAM J. Control Optim. 40 (2001), 681-698.
DOI 10.1137/s0363012999361974 |
MR 1871450
[2] Aliprantis, C. D., Border, K. C.:
Infinite Dimensional Analysis. Third edition. Springer-Verlag, Berlin 2006.
MR 2378491
[3] Almudevar, A.:
Approximate fixed point iteration with an application to infinite horizon Markov decision processes. SIAM J. Control Optim. 46 (2008), 541-561.
DOI 10.1137/040614384 |
MR 2448464
[4] Araposthatis, A., Borkar, V. S., Fernández-Guacherand, E., Gosh, M. K., Marcus, S. I.:
Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31 (1993) 282-344.
DOI 10.1137/0331018 |
MR 1205981
[5] Beutel, L., Gonska, H., Kacsó, D.:
On variation-diminishing Shoenberg operators: new quantitative statements. In: Multivariate Approximation and Interpolations with Applications (M. Gasca, ed.), Monografías de la Academia de Ciencias de Zaragoza No. 20 2002, pp. 9-58.
MR 1966063
[6] Bertsekas, D. P.:
Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs NJ 1987.
MR 0896902 |
Zbl 0649.93001
[7] Bertsekas, D. P., Tsitsiklis, J. N.:
Neuro-Dynamic Programming. Athena Scientific, Belmont 1996.
MR 3444832 |
Zbl 0924.68163
[10] Chang, H. S., Hu, J., Fu, M. C., Marcus, S. I.:
Simulation-Based Algorithms for Markov Decision Processes. Second edition. Springer-Verlag, London 2013.
DOI 10.1007/978-1-4471-5022-0 |
MR 3052425
[12] DeVore, R. A.:
The Approximation of Continuous Functions by Positive Linear Operators. Lectures Notes in Mathematics 293. Springer-Verlag, Berlin, Heidelberg 1972.
DOI 10.1007/bfb0059493 |
MR 0420083
[13] Dorea, C. C. Y., Pereira, A. G. C.:
A note on a variations of Doeblin's condition for uniform ergodicity of Markov chains. Acta Math. Hungar. 110, Issue 4, (2006), 287-292.
DOI 10.1007/s10474-006-0023-y |
MR 2213230
[15] Dufour, F., Prieto-Rumeau, T.:
Stochastic approximations of constrained discounted Markov decision processes. J. Math. Anal. Appl. 413 (2014), 856-879.
DOI 10.1016/j.jmaa.2013.12.016 |
MR 3159809
[16] Dufour, F., Prieto-Rumeau, T.:
Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87 (2015), 273-307.
DOI 10.1080/17442508.2014.939979 |
MR 3316812
[17] Farias, D. P. de, Roy, B. van:
On the existence of fixed points for approximate value iteration and temporal difference learning. J. Optim. Theory Appl. 105 (2000), 589-608.
DOI 10.1023/a:1004641123405 |
MR 1783879
[18] Farias, D. P. de, Roy, B. van: Approximate linear programming for average-cots dynamic programming. In: Advances in Neural Information Processing Systems 15 (S. Becker, S. Thrun and K. Obermayer, eds.), MIT Press, Cambridge MA 2002, pp. 1587-1594.
[19] Farias, D. P. de, Roy, B. Van:
A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees. Math. Oper. Res. 31 (2006), 597-620.
DOI 10.1287/moor.1060.0208 |
MR 2254426
[20] Gordon, G. J.:
Stable function approximation dynamic programming. In: Proc. Twelfth International Conference on Machine Learning (A. Prieditis and S. J. Russell, eds.), Tahoe City CA 1995, pp. 261-268.
DOI 10.1016/b978-1-55860-377-6.50040-2
[21] Gosavi, A.:
A reinforcement learning algorithm based on policy iteration for average reward: empirical results with yield management and convergence analysis. Machine Learning 55 (2004), 5-29.
DOI 10.1023/b:mach.0000019802.64038.6c |
MR 2549123
[26] Hernández-Lerma, O., Montes-de-Oca, R., Cavazos-Cadena, R.:
Recurrence conditions for Markov decision processes with Borel spaces: a survey. Ann. Oper. Res. 29 (1991), 29-46.
DOI 10.1007/bf02055573 |
MR 1105165
[28] Jaskiewicz, A., Nowak, A. S.:
On the optimality equation for average cost Markov control processes with Feller transitions probabilities. J. Math. Anal. Appl. 316 (2006), 495-509.
DOI 10.1016/j.jmaa.2005.04.065 |
MR 2206685
[29] Klein, E., Thompson, A. C.:
Theory of Correspondences. Wiley, New York 1984.
MR 0752692
[31] Lee, J. M., Lee, J. H.: Approximate dynamic programming strategies and their applicability for process control: a review and future direction. Int. J. Control Automat. Systems 2 (2004), 263-278.
[33] Montes-de-Oca, R., Lemus-Rodríguez, E.:
An unbounded Berge's minimum theorem with applications to discounted Markov decision processes. Kybernetika 48 (2012), 268-286.
MR 2954325 |
Zbl 1275.90124
[36] Ortner, R.:
Pseudometrics for state aggregation in average reward Markov decision processes. In: Algorithmic Learning Theory LNAI 4754 (M. Hutter, R. A. Serveido and E. Takimoto, eds.), Springer, Berlin, Heidelberg 2007, pp. 373-387.
DOI 10.1007/978-3-540-75225-7_30
[41] Powell, W. P., Ma, J.:
A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications. J. Control Theory Appl. 9 (2011), 336-352.
DOI 10.1007/s11768-011-0313-y |
MR 2834000
[42] Robles-Alcaraz, M. T., Vega-Amaya, O., Minjárez-Sosa, J. Adolfo:
Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces. Risk Decision Anal. 6 (2017), 79-95.
DOI 10.3233/rda-160116
[43] Rust, J.:
Numerical dynamic programming in economics. In: Handbook of Computational Economics, Vol. 13 (H. Amman, D. Kendrick and J. Rust, eds.), North-Holland, Amsterdam 1996, pp. 619-728.
DOI 10.1016/s1574-0021(96)01016-7 |
MR 1416619
[44] Santos, M. S.:
Analysis of a numerical dynamic programming algorithm applied to economic models. Econometrica 66 (1998), 409-426.
DOI 10.2307/2998564 |
MR 1612175
[46] Saldi, N., Yuksel, S., Linder, T.:
Asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42 (2017), 945-978.
DOI 10.1287/moor.2016.0832 |
MR 3722422
[47] Stachurski, J.:
Continuous state dynamic programming via nonexpansive approximation. Comput. Economics 31 (2008), 141-160.
DOI 10.1007/s10614-007-9111-5
[50] Vega-Amaya, O.:
The average cost optimality equation: a fixed-point approach. Bol. Soc. Mat. Mexicana 9 (2003), 185-195.
MR 1988598
[52] Vega-Amaya, O.:
Solutions of the average cost optimality equation for Markov decision processes with weakly continuous kernel: The fixed-point approach revisited. J. Math. Anal. Appl. 464 (2018), 152-163.
DOI 10.1016/j.jmaa.2018.03.077 |
MR 3794081
[53] Vega-Amaya, O., López-Borbón, J.:
A Perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces. J. Dyn. Games 3 (2016), 261-278.
DOI 10.3934/jdg.2016014 |
MR 3562878
[54] Montes-de-Oca, O. Vega-Amaya amd R.:
Application of average dynamic programming to inventory systems. Math. Methods Oper. Res. 47 (1998) 451-471.
DOI 10.1007/bf01198405 |
MR 1637569