Abstract
We consider the look-ahead control of a conveyor-serviced production station (CSPS) in the context of semi-Markov decision process (SMDP) model, and our goal is to find an optimal control policy under either average- or discounted-cost criteria. Policy iteration (PI), combined with the concept of performance potential, can be applied to provide a unified optimisation framework for both criteria. However, a major difficulty arises in the exact solution scheme, that is, it requires not only the full knowledge of model parameters, but also a considerable amount of work to obtain and process the necessary system and performance matrices. To overcome this difficulty, we propose a potential-based online PI algorithm in this article. During implementation, by analysing and utilising the historic information of all the past operation of a practical CSPS system, the potentials and state-action values are learned on line through an effective exploration scheme. We finally illustrate the successful application of this learning-based technique in CSPS systems by an example.
Acknowledgements
The authors would like to thank Professor Matsui of The University of Electro-Communications, Tokyo, Japan, for his helpful advice on our study of CSPS problems. All errors are ours. This research was supported in part by the National Nature Science Foundation of China (60404009) and the Nature Science Foundation of Anhui Province, PR China (090412046, 070416242).