Search in:

Advanced search

International Journal of Control Volume 82, 2009 - Issue 10

Submit an article Journal homepage

194

Views

CrossRef citations to date

Altmetric

Original Articles

Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration

Tang Hao Department of Computer Science and Technology, Hefei University of Technology, Tunxi Road No.193, Hefei, Anhui 230009, PR ChinaCorrespondence[email protected]

Arai Tamio Department of Precision Engineering, School of Engineering, The University of Tokyo, Building Eng-14, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan

Pages 1917-1928 | Received 10 Aug 2008, Accepted 15 Feb 2009, Published online: 06 Aug 2009

Cite this article
https://doi.org/10.1080/00207170902823006

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Abounadi , J , Bertsekas , D and Borkar , V . 2001 . Learning Algorithms for Markov Decision Processes with Average Cost . SIAM Journal of Control Optimization , 40 : 681 – 698 .
Web of Science ®Google Scholar
Beightler , CS and Crisp Jr , RM . 1968 . A Discrete-time Queueing Analysis of Conveyor-serviced Production Stations . Operations Research , 16 : 986 – 1001 .
Web of Science ®Google Scholar
Bertsekas , DP . 2001 . Dynamic Programming and Optimal Control , Belmont, MA : Athena Scientific .
Google Scholar
Bradtke , SJ and Duff , MO . 1995 . “ Reinforcement Learning Methods for Continuous-time Markov Decision Problems ” . In in Advances in Neural Information Processing Systems 7 , 393 – 400 . Cambridge, MA : MIT Press .
Google Scholar
Cao , XR . 2007 . Stochastic Learning and Optimisation: A Sensitivity-based View , New York : Springer .
Google Scholar
Das , TK , Gosavi , A , Mahadevan , S and Marchalleck , N . 1999 . Solving Semi-Markov Decision Problems using Average Reward Reinforcement Learning . Management Science , 45 : 560 – 574 .
Web of Science ®Google Scholar
Fang , HT and Cao , XR . 2004 . Potential-based Online Policy Iteration Algorithms for Markov Decision Processes . IEEE Transactions on Automatic Control , 49 : 493 – 505 .
Web of Science ®Google Scholar
Gosavi , A . 2003 . Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , Boston : Kluwer .
Google Scholar
Gosavi , A . 2004a . A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis . Machine Learning , 55 : 5 – 29 .
Web of Science ®Google Scholar
Gosavi , A . 2004b . Reinforcement Learning for Long Run Average Cost . European Journal of Operational Research , 155 : 654 – 674 .
Web of Science ®Google Scholar
Lagoudakis , MG and Parr , R . 2003 . Least-squares Policy Iteration . Journal of Machine Learning Research , 40 : 1107 – 1149 .
Google Scholar
Matsui , M . 1993 . A Generalised Model of Conveyor-serviced Production Station (CSPS) (in Japanese) . Journal of Japan Industrial Management Association , 44 : 25 – 32 .
Google Scholar
Matsui , M . 2005 . CSPS Model: Look-ahead Controls and Physics . International Journal of Production Research , 43 : 2001 – 2025 .
Web of Science ®Google Scholar
Matsui , M and Shingu , T . 1978 . A Queueing Analysis of Conveyor-serviced Production Station and the Optimal Range Strategy . AIIE Transactions , 10 : 89 – 99 .
Web of Science ®Google Scholar
Muth , EJ and White , JA . 1979 . Conveyor Theory: A Survey . AIIE Transactions , 11 : 270 – 277 .
Google Scholar
Nawijn , WM . 1981 . The Analysis of a Conveyor-serviced Production Station . European Journal of Operational Research , 6 : 67 – 74 .
Web of Science ®Google Scholar
Nawijn , WM . 1985 . The Optimal Look-ahead Policy for Admission to a Single Server System . Operations Research , 33 : 626 – 643 .
Google Scholar
Puterman , ML . 1994 . Markov Decision Processes: Discrete Stochastic Dynamic Programming , New York : Wiley .
Google Scholar
Schwartz , A . 1993 . “ A Reinforcement Learning Method for Maximising Undiscounted Rewards ” . In Proceeding of the Tenth Annual Conference on Machine Learning , 298 – 305 . Amherst, MA : Morgan Kaufmann .
Google Scholar
Singh , S , Tadic , B and Doucet , A . 2007 . A Policy Gradient Method for Semi-Markov Decision Processes with Application to Call Admission Control . European Journal of Operational Research , 178 : 808 – 818 .
Google Scholar
Tang , H , Xi , HS and Yin , BQ . 2005a . The Optimal Robust Control Policy for Uncertain Semi-Markov Control Processes . International Journal of Systems Science , 36 : 791 – 800 .
Google Scholar
Tang , H , Yuan , JB , Lu , Y and Cheng , WJ . 2005b . Performance Potential-based Neuro-dynamic Programming for SMDPs . Acta Automatica Sinica , 31 : 642 – 645 .
Google Scholar
Tang , H , Xi , HS and Yin , BQ . 2007 . Error Bounds of Optimisation Algorithms for Semi-Markov Decision Processes . International Journal of Systems Science , 38 : 725 – 736 .
Google Scholar
Watkins , CJCH . 1989 . “ Learning from Delayed Rewards ” . In PhD thesis , Cambridge, , UK : Cambridge University .
Google Scholar
Yin , BQ , Xi , HS and Zhou , YP . 2004 . Queueing System Performance Analysis and Markov Control Processes , Hefei : Press of University of Science and Technology of China .
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date