References
- Abounadi , J , Bertsekas , D and Borkar , V . 2001 . Learning Algorithms for Markov Decision Processes with Average Cost . SIAM Journal of Control Optimization , 40 : 681 – 698 .
- Beightler , CS and Crisp Jr , RM . 1968 . A Discrete-time Queueing Analysis of Conveyor-serviced Production Stations . Operations Research , 16 : 986 – 1001 .
- Bertsekas , DP . 2001 . Dynamic Programming and Optimal Control , Belmont, MA : Athena Scientific .
- Bradtke , SJ and Duff , MO . 1995 . “ Reinforcement Learning Methods for Continuous-time Markov Decision Problems ” . In in Advances in Neural Information Processing Systems 7 , 393 – 400 . Cambridge, MA : MIT Press .
- Cao , XR . 2007 . Stochastic Learning and Optimisation: A Sensitivity-based View , New York : Springer .
- Das , TK , Gosavi , A , Mahadevan , S and Marchalleck , N . 1999 . Solving Semi-Markov Decision Problems using Average Reward Reinforcement Learning . Management Science , 45 : 560 – 574 .
- Fang , HT and Cao , XR . 2004 . Potential-based Online Policy Iteration Algorithms for Markov Decision Processes . IEEE Transactions on Automatic Control , 49 : 493 – 505 .
- Gosavi , A . 2003 . Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , Boston : Kluwer .
- Gosavi , A . 2004a . A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis . Machine Learning , 55 : 5 – 29 .
- Gosavi , A . 2004b . Reinforcement Learning for Long Run Average Cost . European Journal of Operational Research , 155 : 654 – 674 .
- Lagoudakis , MG and Parr , R . 2003 . Least-squares Policy Iteration . Journal of Machine Learning Research , 40 : 1107 – 1149 .
- Matsui , M . 1993 . A Generalised Model of Conveyor-serviced Production Station (CSPS) (in Japanese) . Journal of Japan Industrial Management Association , 44 : 25 – 32 .
- Matsui , M . 2005 . CSPS Model: Look-ahead Controls and Physics . International Journal of Production Research , 43 : 2001 – 2025 .
- Matsui , M and Shingu , T . 1978 . A Queueing Analysis of Conveyor-serviced Production Station and the Optimal Range Strategy . AIIE Transactions , 10 : 89 – 99 .
- Muth , EJ and White , JA . 1979 . Conveyor Theory: A Survey . AIIE Transactions , 11 : 270 – 277 .
- Nawijn , WM . 1981 . The Analysis of a Conveyor-serviced Production Station . European Journal of Operational Research , 6 : 67 – 74 .
- Nawijn , WM . 1985 . The Optimal Look-ahead Policy for Admission to a Single Server System . Operations Research , 33 : 626 – 643 .
- Puterman , ML . 1994 . Markov Decision Processes: Discrete Stochastic Dynamic Programming , New York : Wiley .
- Schwartz , A . 1993 . “ A Reinforcement Learning Method for Maximising Undiscounted Rewards ” . In Proceeding of the Tenth Annual Conference on Machine Learning , 298 – 305 . Amherst, MA : Morgan Kaufmann .
- Singh , S , Tadic , B and Doucet , A . 2007 . A Policy Gradient Method for Semi-Markov Decision Processes with Application to Call Admission Control . European Journal of Operational Research , 178 : 808 – 818 .
- Tang , H , Xi , HS and Yin , BQ . 2005a . The Optimal Robust Control Policy for Uncertain Semi-Markov Control Processes . International Journal of Systems Science , 36 : 791 – 800 .
- Tang , H , Yuan , JB , Lu , Y and Cheng , WJ . 2005b . Performance Potential-based Neuro-dynamic Programming for SMDPs . Acta Automatica Sinica , 31 : 642 – 645 .
- Tang , H , Xi , HS and Yin , BQ . 2007 . Error Bounds of Optimisation Algorithms for Semi-Markov Decision Processes . International Journal of Systems Science , 38 : 725 – 736 .
- Watkins , CJCH . 1989 . “ Learning from Delayed Rewards ” . In PhD thesis , Cambridge, , UK : Cambridge University .
- Yin , BQ , Xi , HS and Zhou , YP . 2004 . Queueing System Performance Analysis and Markov Control Processes , Hefei : Press of University of Science and Technology of China .