Search in:

Applied Artificial Intelligence

An International Journal

Volume 33, 2019 - Issue 1

Submit an article Journal homepage

Free access

1,221

Views

CrossRef citations to date

Altmetric

Articles

Approximate Q-Learning for Stacking Problems with Continuous Production and Retrieval

Judith FechterHeuristic and Evolutionary Algorithms Laboratory, School of Informatics, Communications and Media, University of Applied Sciences Upper Austria, Hagenberg, AustriaCorrespondence[email protected]

Andreas BehamHeuristic and Evolutionary Algorithms Laboratory, School of Informatics, Communications and Media, University of Applied Sciences Upper Austria, Hagenberg, Austria

Stefan WagnerHeuristic and Evolutionary Algorithms Laboratory, School of Informatics, Communications and Media, University of Applied Sciences Upper Austria, Hagenberg, Austria

Michael AffenzellerHeuristic and Evolutionary Algorithms Laboratory, School of Informatics, Communications and Media, University of Applied Sciences Upper Austria, Hagenberg, Austria

Pages 68-86 | Published online: 02 Nov 2018

Cite this article
https://doi.org/10.1080/08839514.2018.1525852
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Abdulhai, B., R. Pringle, and G. Karakoulas. 2003. Reinforcement learning for true adaptive traffic signal control. Journal Transp Engineering 129 (3):278–85. doi:10.1061/(ASCE)0733-947X(2003)129:3(278).
Web of Science ®Google Scholar
Baird, L. 1995. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Machine Learning, 30–37. Tahoe City, California.
Google Scholar
Balaji, P. G., X. German, and D. Srinivasan. 2010. Urban traffic signal control using reinforcement learning agents. IET Intelligent Transport Systems 4 (3):177–88. doi:10.1049/iet-its.2009.0096.
Web of Science ®Google Scholar
Beham, A., G. K. Kronberger, J. Karder, M. Kommenda, A. Scheibenpflug, S. Wagner, and M. Affenzeller. 2014. Integrated simulation and optimization in HeuristicLab. In Proceedings of the 26th European Modeling and Simulation Symposium EMSS, 418–23, Bordeaux, France.
Google Scholar
Bertsekas, D. P., and S. Ioffe. 1996. Temporal differences–based policy iteration and applications in neuro–dynamic programming. Lab. for Info. and Decision Systems Report LIDS–P–2349. Cambridge, MA: MIT.
Google Scholar
Bertsekas, D. P., and J. N. Tsitsiklis. 1995. Neuro–dynamic programming: An overview. In Proceedings of the 34th Conference on Decision & Control, 560–64. New Orleans, LA.
Google Scholar
Bertsekas, D. P., and J. N. Tsitsiklis. 1996. Neuro–dynamic programming. Belmont: Athena Scientific.
Google Scholar
Bortfeldt, A., and F. Forster. 2012. A tree search procedure for the container premarshalling problem. European Journal of Operational Research 217:531–40. doi:10.1016/j.ejor.2011.10.005.
Web of Science ®Google Scholar
Boyan, J., and M. Littman. 1994. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems 6:671–78.
Google Scholar
Boysen, N., and S. Emde. 2016. The parallel stack loading problem to minimize blockages. European Journal of Operational Research 249 (2):618–27. doi:10.1016/j.ejor.2015.09.033.
Web of Science ®Google Scholar
Busoniu, L., R. Babuska, B. De Schutter, and D. Ernst. 2010. Reinforcement learning and dynamic programming using function approximators. NY: CRC Press.
Google Scholar
Caserta, M., S. Schwarze, and S. Voss. 2011a. Container rehandling at maritime container terminals. In Handbook of terminal planning in operations research/computer science interfaces series 49, ed. J. W. BöSe, 247–69. NY: Springer.
Google Scholar
Caserta, M., S. Voss, and M. Sniedovich. 2011b. Applying the corridor method to a blocks relocation problem. OR Spectrum 33:915–29. doi:10.1007/s00291-009-0176-5.
Web of Science ®Google Scholar
Crites, R. H., and A. G. Barto. 1996. Improving elevator performance using reinforcement learning. Advances in Neural Information Processing Systems 8:1017–23.
Google Scholar
Gabillon, V., M. Ghavamzadeh, and B. Scherrer. 2013. Approximate dynamic programming finally performs well in the game of Tetris. Advances in Neural Information Processing Systems 26:1754–62.
Google Scholar
Gharehgozli, A. H., Y. Yu, R. de Koster, and J. T. Udding. 2014. A decision–Tree stacking heuristic minimising the expected number of reshuffles at a container terminal. International Journal of Production Research 52 (9):2592–611. doi:10.1080/00207543.2013.861618.
Web of Science ®Google Scholar
HEAL. 2015. HeuristicLab additional material for publications. Accessed December 23, 2015. http://dev.heuristiclab.com/AdditionalMaterial.
Google Scholar
Hirashima, Y. 2008. An intelligent marshalling plan using a new reinforcement learning system for container yard terminals in new developments in robotics automation and control. Rijeka: INTECH Open Access Publisher.
Google Scholar
Hirashima, Y. 2009. A Q–Learning system for container marshalling with group–based learning model at container yard terminals. In Proceedings of the International MultiConference of Engineers and Computer Scientists Vol I.
Google Scholar
Kefi, M., O. Korbaa, K. Ghedira, and P. Yim. 2009. Container handling using multi–Agent architecture. International Journal of Intelligent Information and Database Systems 3 (3):338–60. doi:10.1504/IJIIDS.2009.027691.
Google Scholar
Kim, B. I., J. Koo, and H. P. Sambhajirao. 2011. A simplified steel plate stacking problem. International Journal of Production Research 49 (17):5133–51. doi:10.1080/00207543.2010.518998.
Web of Science ®Google Scholar
Kim, K. H., and G. P. Hong. 2006. A heuristic rule for relocating blocks. Computers & Operations Research 33:940–54. doi:10.1016/j.cor.2004.08.005.
Web of Science ®Google Scholar
Lehnfeld, J., and S. Knust. 2014. Loading, unloading and premarshalling of stacks in storage areas: Survey and classification. European Journal of Operational Research 239 (2):297–312. doi:10.1016/j.ejor.2014.03.011.
Web of Science ®Google Scholar
McPartland, M., and M. Gallagher. 2011. Reinforcement learning in first person shooter games. IEEE Transactions on Computational Intelligence and AI in Games 3 (1):43–56. doi:10.1109/TCIAIG.2010.2100395.
Web of Science ®Google Scholar
Melo, F. S., S. P. Meyn, and M. I. Ribeiro. 2008. An analysis of reinforcement learning with function approximation. In Proceedings of the 25th International Conference on Machine Learning, 664–71.
Google Scholar
Nishi, T., and M. Konishi. 2010. An optimisation model and its effective beam search heuristics for floor–Storage warehousing systems. International Journal of Production Research 48:1947–66. doi:10.1080/00207540802603767.
Web of Science ®Google Scholar
Prashanth, L., and S. Bhatnagar. 2011. Reinforcement learning with function approximation for traffic signal control. IEEE Transactions on Intelligent Transportation Systems 12 (2):412–21. doi:10.1109/TITS.2010.2091408.
Web of Science ®Google Scholar
Rei, R. J., M. Kubo, and J. P. Pedroso. 2008. Simulation–Based optimization for steel stacking. In Modelling, computation and optimization in information systems and management sciences, ed. H. A. Le Thi, P. Bouvry, and T. Pham Dinh, 254–63. Berlin, Heidelberg: Springer Berlin Heidelberg.
Google Scholar
Rei, R. J., and J. P. Pedroso. 2012. Heuristic search for the stacking problem. International Transactions in Operational Research 19 (3):379–95. doi:10.1111/itor.2012.19.issue-3.
Web of Science ®Google Scholar
Salido, M. A., O. Sapena, and F. Barber. 2009. The container stacking problem: An artificial intelligence planning–Based approach. In Proceedings of the The International Conference on Harbor, Maritime & Multimodal Logistics, Modelling and Simulation 2009, Tenerife.
Google Scholar
Shin, E. J., and K. H. Kim. 2015. Hierarchical remarshaling operations in block stacking storage systems considering duration of stay. Computers & Industrial Engineering 89:43–52. doi:10.1016/j.cie.2015.03.023.
Web of Science ®Google Scholar
Stone, P., R. S. Sutton, and G. Kuhlmann. 2005. Reinforcement learning for RoboCup–Soccer keepaway. Adaptive Behavior 13 (3):165–88. doi:10.1177/105971230501300301.
Web of Science ®Google Scholar
Sutton, R. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3:9–44. doi:10.1007/BF00115009.
Google Scholar
Sutton, R., and R. Barto. 1998. Reinforcement learning: An introduction. Cambridge, London: MIT Press.
Google Scholar
Szepesvári, C. 2010. Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 4 (1):1–103. doi:10.2200/S00268ED1V01Y201005AIM009.
Google Scholar
Tang, L., R. Zhao, and J. Liu. 2012. Models and algorithms for shuffling problems in steel plants. Naval Research Logistics 59:502–24. doi:10.1002/nav.v59.7.
Web of Science ®Google Scholar
Tesauro, G. 1994. TD-Gammon, a self–Teaching backgammon program, achieves master–Level play. Neural Computation 6:215–19. doi:10.1162/neco.1994.6.2.215.
Web of Science ®Google Scholar
Tsitsiklis, J. N., and B. V. Roy. 1997. An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control 42 (5):674–90. doi:10.1109/9.580874.
Web of Science ®Google Scholar
Tsitsiklis, J. N., and B. van Roy. 1996. Feature–based methods for large scale dynamic programming. Machine Learning 22:59–94. doi:10.1007/BF00114724.
Web of Science ®Google Scholar
Uther, M., and M. Veloso. 1998. Tree based discretization for continuous state space reinforcement learning. In Proceedings of AAAI–98, 769–74.
Google Scholar
Van Hasselt, H. 2012. Reinforcement learning in continuous state and action spaces. In Reinforcement learning in adaptation, learning, and optimization 12, ed. M. Wiering and M. van Otterlo, 207–51. Berlin, Heidelberg: Springer Berlin Heidelberg.
Google Scholar
Wagner, S., G. Kronberger, A. Beham, M. Kommenda, A. Scheibenpflug, E. Pitzer, S. Vonolfen, M. Kofler, S. Winkler, V. Dorfer, et al. 2014. Architecture and design of the heuristiclab optimization environment. Advanced methods and applications in computational intelligence. In Topics in intelligent engineering and informatics series, ed. R. Klempous, J. Nikodem, W. Jacak, and Z. Chaczko, 197–261. Switzerland: Springer International Publishing.
Google Scholar
Wang, X., Y. Cheng, and J.-Q. Yi. 2007. A fuzzy actor–Critic reinforcement learning network. Information Sciences 177 (18):3764–81. doi:10.1016/j.ins.2007.03.012.
Web of Science ®Google Scholar
Xu, X., L. Zuo, and Z. Huang. 2014. Reinforcement learning algorithms with function approximation: Recent advances and applications. Information Sciences 261:1–31. doi:10.1016/j.ins.2013.08.037.
Web of Science ®Google Scholar
Zäpfel, G., and M. Wasner. 2006. Warehouse sequencing in the steel supply chain as a generalized job shop model. International Journal of Production Economics 104:482–501. doi:10.1016/j.ijpe.2004.10.005.
Web of Science ®Google Scholar
Zhang, W., and T. G. Dietterich. 1995. A reinforcement learning approach to job–Shop scheduling. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1114–20.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Approximate Q-Learning for Stacking Problems with Continuous Production and Retrieval

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Approximate Q-Learning for Stacking Problems with Continuous Production and Retrieval

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date