References
- Abounadi , J. ( 1998 ) Stochastic approximation for non-expansive maps applications to Q-learning algorithms. Unpublished Ph.D. Thesis, Department of Electrical Engineering and Computer Science , Massachusetts Institute of Technology , Cambridge , MA .
- Arkin , R.C. ( 1998 ) Behavior-based Robotics , 1st edn , The MIT Press , Cambridge , MA .
- Askin , R.G. and Standridge , C.R. ( 1993 ) Modeling and Analysis of Manufacturing Systems , 1st edn. John Wiley & Sons , New York , NY .
- Berkley , B.J. ( 1992 ) A review of the kanban production control research literature. Production and Operations Management , 1 ( 4 ), 393 – 411 .
- Bertsekas , D.P. and Tsitsiklis , J.N. ( 1996 ) Neuro-Dynamic Programming , Athena Scientific , Belmont , MA .
- Bonvik , A.M. , Couch , C.E. and Gershwin , S.B. ( 1997 ) A comparison of production-line control mechanisms. International Journal of Production Research , 35 ( 3 ), 789 – 804 .
- Buzacott , J.A. and Shantikumar , J.G. ( 1992 ) A general approach for coordinating production in multiple cell manufacturing systems. Production and Operation Management , 1 ( 1 ), 34 – 52 .
- Dallery , Y. and Liberopoulos , G. ( 2000 ) Extended kanban control system combining kanban and base stock. IIE Transactions , 32 ( 4 ), 369 – 386 .
- Das , T.K. , Gosavi , A. , Mahadevan , S. and Marchellack , N. ( 1999 ) Solving semi-Markov decision problems using average reward reinforcement learning. Management Science , 45 ( 4 ), 560 – 574 .
- Das , T.K. and Sarkar , S. ( 1999 ) Optimal preventive maintenance in a production–inventory system. IIE Transactions , 31 ( 6 ), 537 – 551 .
- Frein , Y. , Di Mascolo , M. and Dallery , Y. ( 2000 ) On the design of generalized kanban control systems. International Journal of Operations and Production Management ( in press ).
- Gershwin , S.B. ( 1994 ) Manufacturing Systems Engineering , Prentice Hall , Englewoods Cliffs , NJ .
- Gosavi , A. ( 1999 ) An algorithm for solving semi-Markov decision problems using reinforcement learning convergence analysis and numerical results. Unpublished Ph.D. Thesis, Department of Industrial Engineering , University of South Florida , Tampa , FL 33620 .
- Kaelbling , L.P. , Liftman , M.L. and Moore , A.W. ( 1996 ) Reinforcement learning a survey. Journal of Artificial Intelligence Research , 4 , 237 – 285 .
- Law , A.M. and Kelton , WD. ( 1991 ) Simulation Modeling and Analysis , McGraw-Hill, Inc. , New York , NY .
- Lutz , CM. , Davis , K.R. and Sun , M. ( 1998 ) Determining buffer location and size in production lines using tabu search. European Journal of Operational Research , 106 ( 2/3 ), 301 – 316 .
- Mahadevan , S. Theochaurus. G. ( 1998 ) Optimizing production manufacturing using reinforcement learning , in Proceedings of the Eleventh International FLAIRS Conference , AAAI Press , Menlo Park , CA , pp. 372 – 377 .
- Muckstadt , J.A , and Tayur , S.R. ( 1995 a) Comparison of alternative kanban control mechanisms. I. background and structural results. IIE Transactions , 27 ( 2 ), 140 – 150 .
- Muckstadt , J.A. and Tayur , S.R. ( 1995 b) Comparison of alternative kanban control mechanisms. II. experimental results. IIE Transactions . 27 ( 2 ), 151 – 161 .
- Putterman , M.L. ( 1994 ) Markov Decision Processes , Wiley Inter-science , New York , NY .
- Sethi , S. and Zhang , Q. ( 1994 ) Hierarchical Decision Making in Stochastic Manufacturing Systems. Birkhäuser , Boston , MA .
- Sethi , S. , Zhang , H. and Zhang , Q. ( 1997 ) Hierarchical production control in a stochastic manufacturing system with long-run average cost. Journal of Mathematical Analysis and Applications , 214 , 151 – 172 .
- So , K.C. and Pinnault , S.C. ( 1988 ) Allocating buffer storages in a pull system. International Journal of Production Research , 15 ( 12 ), 1959 – 1980 .
- Spearman , M.L. , Woodruff , D.L. and Hoop , W.J. ( 1990 ) CONWIP a pull alternative to kanban. International Journal of Production Research , 28 ( 5 ), 879 – 894 .
- Sugimori , Y. , Kusunoki , K. , Cho , F. and Uchikawa , S. ( 1977 ) Toyota produclion system and kanban system materialization of just-in-time and respect-for-humans systems. International Journal of Production Research , 15 ( 6 ), 553 – 564 .
- Sutton , R.S. ( 1988 ) Learning to predict by the methods of temporal differences. Machine Learning , 3 , 9 – 44 .
- Sutton , R.S. and Barto , A.G. ( 1998 ) Reinforcement Learning An Introduction , MIT Press , Cambridge , MA .
- Tabe , T. , Muramatsu , R. and Tanaka , Y. ( 1980 ) Analysis of production ordering quantities and inventory variations in a multi-stage production ordering system. International Journal of Production Research , 18 ( 2 ), 245 – 257 .
- Van Ryzin , G , Lou , S.X. and Gershwin , S.B. ( 1993 ) Production control for a tandem two-machine system. IIE Transactions , 25 ( 5 ), 5 – 20 .
- Veatch , M.H. and Wein , L.M. ( 1994 ) Optimal control of a two-station tandem production/inventory system. Operations Research , 42 ( 2 ), 337 – 350 .
- Watkins , C.J. ( 1989 ) Learning from delayed rewards. Ph.D. thesis , Kings College , Cambridge , England .