1,221
Views
2
CrossRef citations to date
0
Altmetric
Articles

Approximate Q-Learning for Stacking Problems with Continuous Production and Retrieval

, , &

References

  • Abdulhai, B., R. Pringle, and G. Karakoulas. 2003. Reinforcement learning for true adaptive traffic signal control. Journal Transp Engineering 129 (3):278–85. doi:10.1061/(ASCE)0733-947X(2003)129:3(278).
  • Baird, L. 1995. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Machine Learning, 30–37. Tahoe City, California.
  • Balaji, P. G., X. German, and D. Srinivasan. 2010. Urban traffic signal control using reinforcement learning agents. IET Intelligent Transport Systems 4 (3):177–88. doi:10.1049/iet-its.2009.0096.
  • Beham, A., G. K. Kronberger, J. Karder, M. Kommenda, A. Scheibenpflug, S. Wagner, and M. Affenzeller. 2014. Integrated simulation and optimization in HeuristicLab. In Proceedings of the 26th European Modeling and Simulation Symposium EMSS, 418–23, Bordeaux, France.
  • Bertsekas, D. P., and S. Ioffe. 1996. Temporal differences–based policy iteration and applications in neuro–dynamic programming. Lab. for Info. and Decision Systems Report LIDS–P–2349. Cambridge, MA: MIT.
  • Bertsekas, D. P., and J. N. Tsitsiklis. 1995. Neuro–dynamic programming: An overview. In Proceedings of the 34th Conference on Decision & Control, 560–64. New Orleans, LA.
  • Bertsekas, D. P., and J. N. Tsitsiklis. 1996. Neuro–dynamic programming. Belmont: Athena Scientific.
  • Bortfeldt, A., and F. Forster. 2012. A tree search procedure for the container premarshalling problem. European Journal of Operational Research 217:531–40. doi:10.1016/j.ejor.2011.10.005.
  • Boyan, J., and M. Littman. 1994. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems 6:671–78.
  • Boysen, N., and S. Emde. 2016. The parallel stack loading problem to minimize blockages. European Journal of Operational Research 249 (2):618–27. doi:10.1016/j.ejor.2015.09.033.
  • Busoniu, L., R. Babuska, B. De Schutter, and D. Ernst. 2010. Reinforcement learning and dynamic programming using function approximators. NY: CRC Press.
  • Caserta, M., S. Schwarze, and S. Voss. 2011a. Container rehandling at maritime container terminals. In Handbook of terminal planning in operations research/computer science interfaces series 49, ed. J. W. BöSe, 247–69. NY: Springer.
  • Caserta, M., S. Voss, and M. Sniedovich. 2011b. Applying the corridor method to a blocks relocation problem. OR Spectrum 33:915–29. doi:10.1007/s00291-009-0176-5.
  • Crites, R. H., and A. G. Barto. 1996. Improving elevator performance using reinforcement learning. Advances in Neural Information Processing Systems 8:1017–23.
  • Gabillon, V., M. Ghavamzadeh, and B. Scherrer. 2013. Approximate dynamic programming finally performs well in the game of Tetris. Advances in Neural Information Processing Systems 26:1754–62.
  • Gharehgozli, A. H., Y. Yu, R. de Koster, and J. T. Udding. 2014. A decision–Tree stacking heuristic minimising the expected number of reshuffles at a container terminal. International Journal of Production Research 52 (9):2592–611. doi:10.1080/00207543.2013.861618.
  • HEAL. 2015. HeuristicLab additional material for publications. Accessed December 23, 2015. http://dev.heuristiclab.com/AdditionalMaterial.
  • Hirashima, Y. 2008. An intelligent marshalling plan using a new reinforcement learning system for container yard terminals in new developments in robotics automation and control. Rijeka: INTECH Open Access Publisher.
  • Hirashima, Y. 2009. A Q–Learning system for container marshalling with group–based learning model at container yard terminals. In Proceedings of the International MultiConference of Engineers and Computer Scientists Vol I.
  • Kefi, M., O. Korbaa, K. Ghedira, and P. Yim. 2009. Container handling using multi–Agent architecture. International Journal of Intelligent Information and Database Systems 3 (3):338–60. doi:10.1504/IJIIDS.2009.027691.
  • Kim, B. I., J. Koo, and H. P. Sambhajirao. 2011. A simplified steel plate stacking problem. International Journal of Production Research 49 (17):5133–51. doi:10.1080/00207543.2010.518998.
  • Kim, K. H., and G. P. Hong. 2006. A heuristic rule for relocating blocks. Computers & Operations Research 33:940–54. doi:10.1016/j.cor.2004.08.005.
  • Lehnfeld, J., and S. Knust. 2014. Loading, unloading and premarshalling of stacks in storage areas: Survey and classification. European Journal of Operational Research 239 (2):297–312. doi:10.1016/j.ejor.2014.03.011.
  • McPartland, M., and M. Gallagher. 2011. Reinforcement learning in first person shooter games. IEEE Transactions on Computational Intelligence and AI in Games 3 (1):43–56. doi:10.1109/TCIAIG.2010.2100395.
  • Melo, F. S., S. P. Meyn, and M. I. Ribeiro. 2008. An analysis of reinforcement learning with function approximation. In Proceedings of the 25th International Conference on Machine Learning, 664–71.
  • Nishi, T., and M. Konishi. 2010. An optimisation model and its effective beam search heuristics for floor–Storage warehousing systems. International Journal of Production Research 48:1947–66. doi:10.1080/00207540802603767.
  • Prashanth, L., and S. Bhatnagar. 2011. Reinforcement learning with function approximation for traffic signal control. IEEE Transactions on Intelligent Transportation Systems 12 (2):412–21. doi:10.1109/TITS.2010.2091408.
  • Rei, R. J., M. Kubo, and J. P. Pedroso. 2008. Simulation–Based optimization for steel stacking. In Modelling, computation and optimization in information systems and management sciences, ed. H. A. Le Thi, P. Bouvry, and T. Pham Dinh, 254–63. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Rei, R. J., and J. P. Pedroso. 2012. Heuristic search for the stacking problem. International Transactions in Operational Research 19 (3):379–95. doi:10.1111/itor.2012.19.issue-3.
  • Salido, M. A., O. Sapena, and F. Barber. 2009. The container stacking problem: An artificial intelligence planning–Based approach. In Proceedings of the The International Conference on Harbor, Maritime & Multimodal Logistics, Modelling and Simulation 2009, Tenerife.
  • Shin, E. J., and K. H. Kim. 2015. Hierarchical remarshaling operations in block stacking storage systems considering duration of stay. Computers & Industrial Engineering 89:43–52. doi:10.1016/j.cie.2015.03.023.
  • Stone, P., R. S. Sutton, and G. Kuhlmann. 2005. Reinforcement learning for RoboCup–Soccer keepaway. Adaptive Behavior 13 (3):165–88. doi:10.1177/105971230501300301.
  • Sutton, R. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3:9–44. doi:10.1007/BF00115009.
  • Sutton, R., and R. Barto. 1998. Reinforcement learning: An introduction. Cambridge, London: MIT Press.
  • Szepesvári, C. 2010. Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 4 (1):1–103. doi:10.2200/S00268ED1V01Y201005AIM009.
  • Tang, L., R. Zhao, and J. Liu. 2012. Models and algorithms for shuffling problems in steel plants. Naval Research Logistics 59:502–24. doi:10.1002/nav.v59.7.
  • Tesauro, G. 1994. TD-Gammon, a self–Teaching backgammon program, achieves master–Level play. Neural Computation 6:215–19. doi:10.1162/neco.1994.6.2.215.
  • Tsitsiklis, J. N., and B. V. Roy. 1997. An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control 42 (5):674–90. doi:10.1109/9.580874.
  • Tsitsiklis, J. N., and B. van Roy. 1996. Feature–based methods for large scale dynamic programming. Machine Learning 22:59–94. doi:10.1007/BF00114724.
  • Uther, M., and M. Veloso. 1998. Tree based discretization for continuous state space reinforcement learning. In Proceedings of AAAI–98, 769–74.
  • Van Hasselt, H. 2012. Reinforcement learning in continuous state and action spaces. In Reinforcement learning in adaptation, learning, and optimization 12, ed. M. Wiering and M. van Otterlo, 207–51. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Wagner, S., G. Kronberger, A. Beham, M. Kommenda, A. Scheibenpflug, E. Pitzer, S. Vonolfen, M. Kofler, S. Winkler, V. Dorfer, et al. 2014. Architecture and design of the heuristiclab optimization environment. Advanced methods and applications in computational intelligence. In Topics in intelligent engineering and informatics series, ed. R. Klempous, J. Nikodem, W. Jacak, and Z. Chaczko, 197–261. Switzerland: Springer International Publishing.
  • Wang, X., Y. Cheng, and J.-Q. Yi. 2007. A fuzzy actor–Critic reinforcement learning network. Information Sciences 177 (18):3764–81. doi:10.1016/j.ins.2007.03.012.
  • Xu, X., L. Zuo, and Z. Huang. 2014. Reinforcement learning algorithms with function approximation: Recent advances and applications. Information Sciences 261:1–31. doi:10.1016/j.ins.2013.08.037.
  • Zäpfel, G., and M. Wasner. 2006. Warehouse sequencing in the steel supply chain as a generalized job shop model. International Journal of Production Economics 104:482–501. doi:10.1016/j.ijpe.2004.10.005.
  • Zhang, W., and T. G. Dietterich. 1995. A reinforcement learning approach to job–Shop scheduling. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1114–20.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.