References
- Abounadi , J. , Bertsekas , D. and Borkar , V. S. 1998 . Learning algorithms for Markov decision processes with average cost report. LIDS-P-2434 , Cambridge, MA : Laboratory for Information and Decision Systems, MIT .
- Anupindi , R. , Bassok , Y. and Zemel , E. 2001 . A general framework for the study of decentralized distribution systems . Journal of Manufacturing and Service Operations Management , : 4
- Bellman , R. E. 1957 . Dynamic Programming , Princeton, NJ : Princeton University Press .
- Bertsekas , D. and Tsitsiklis , J. 1996 . Neurodynamic Programming , Belmont, MA : Athena Scientific .
- Darken , C. , Chang , J. and Moody , J. 1992 . “ Learning rate schedules for faster stochastic gradient search ” . In Neural Networks for Signal Processing 2—Proceedings of the 1992 IEEE Workshop , Edited by: White , D. A. and Sofge , D. A. Piscataway, NJ : IEEE Press .
- Das , T. K. , Gosavi , A. , Mahadevan , S. and Marchalleck , N. 1999 . Solving semi-Markov decision problems using average reward reinforcement learning . Management Science , 45 ( 4 ) : 560 – 574 .
- Erev , I. and Roth , A. E. 1998 . Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria . The American Economic Review , 88 ( 4 ) : 848 – 881 .
- Filar , J. and Vrieze , K. 1997 . Competitive Markov Decision Processes , New York, NY : Springer-Verlag .
- Gosavi , A. 2004 . “ Reinforcement learning for long-run average cost ” . In European Journal of Operations Research to appear
- Gosavi , A. , Bandla , N. and Das , T. K. 2002 . A reinforcement learning approach to airline seat allocation for multiple fare classes with overbooking . IIE Transactions , 34 ( 9 ) : 729 – 742 .
- Hu , J. and Wellman , M. P. 1998 . “ Multi-agent reinforcement learning: theoretical framework and an algorithm ” . In Proceedings of the 15th International Conference on Machine Learning 242 – 250 .
- Li , J. and Das , T. K. 2003 . Learning Nash equilibrium for average reward irreducible stochastic games , Tampa, FL : University of South Florida . Working paper, Department of Industrial and Management Systems Engineering
- Littman , M. L. 1994 . “ Markov games as a framework for multi-agent reinforcement learning ” . In Proceedings of the 11th International Conference on Machine Learning 157 – 163 .
- Nash , J. F. 1951 . Non-cooperative games . Annals of Mathematics , 54 : 286 – 295 .
- Owen , G. 1975 . On the core of linear production games . Mathamatical Programming , 9 : 358 – 370 .
- Paternina , C. D. and Das , T. K. 2000 . Intelligent dynamic control policies for serial production lines . IIE Transactions , 33 ( 1 ) : 65 – 77 .
- Puterman , M. L. 1994 . Markov Decision Processes , New York, NY : Wiley .
- Ripley , B. D. 1996 . Pattern Recognition and Neural Networks , Oxford, UK : Cambridge University Press .
- Robbins , H. and Monro , S. 1951 . A stochastic approximation method . Annals of Mathematical and Statistics , 22 : 400 – 407 .
- Shapley , L. and Shubik , M. 1975 . Competitive outcomes in the core of market games Technical report R-1692-NSF, The Rand Corporation
- Sutton , R. S. and Barto , A. 1998 . Reinforcement Learning , Cambridge, MA : MIT Press .
- Van der Lann , G. , Talman , A. J. J. and Van der Heyden , L. 1987 . “ Simplicial variable dimension algorithms for solving the nonlinear complimentary problem on a product of unit simplices using a general labeling ” . In Mathematics of Operations Research 377 – 397 .
- Van Roy , B. 1998 . Learning and value function approximation in complex decision processes , Cambridge, MA : Laboratory for Information and Decision Systems, MIT . Ph.D. thesis
- Watkins , C. J. C. H. 1989 . Learning from delayed rewards , Cambridge, UK : Cambridge University . Ph.D. thesis