Search in:

Advanced search

International Journal of Control Volume 95, 2022 - Issue 9

Submit an article Journal homepage

392

Views

CrossRef citations to date

Altmetric

Research Article

Bellman's principle of optimality and deep reinforcement learning for time-varying tasks

Alessandro GiuseppiDepartment of Computer, Control, and Management Engineering, Antonio Ruberti at the University of Rome “La Sapeinza”, Rome, ItalyCorrespondence[email protected]

https://orcid.org/0000-0001-5503-8506

Antonio PietrabissaDepartment of Computer, Control, and Management Engineering, Antonio Ruberti at the University of Rome “La Sapeinza”, Rome, Italy

https://orcid.org/0000-0003-0188-3346

Pages 2448-2459 | Received 04 Oct 2020, Accepted 01 Apr 2021, Published online: 16 Apr 2021

Cite this article
https://doi.org/10.1080/00207179.2021.1913516
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983, September). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics SMC, 13(5), 834–846. https://doi.org/10.1109/TSMC.1983.6313077
Web of Science ®Google Scholar
Bellman, R. (1966, July). Dynamic programming. Science (New York, N.Y.), 153(3731), 34–37. https://doi.org/10.1126/science.153.3731.34
PubMed Web of Science ®Google Scholar
Bertsekas, D. P. (2005). Dynamic programming and optimal control (3rd ed., Vol. I). Athena Scientific.
Google Scholar
Borsa, D., Graepel, T., & Shawe-Taylor, J. (2016). Learning shared representations in multi-task reinforcement learning. Retrieved from http://arxiv.org/abs/1603.02041.
Google Scholar
Boyan, J. A., & Littman, M. L. (2001). Exact solutions to time-dependent MDPs. In Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, CO (pp. 982–988).MIT Press.
Google Scholar
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., & Tang, J. (2016). Openai gym. Retrieved from https://arxiv.org/abs/1606.01540.
Google Scholar
Choi, S. P., Yeung, D. Y., & Zhang, N. L. (2000). An environment model for nonstationary reinforcement learning. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, CO (pp. 987–993). MIT Press.
Google Scholar
Choi, S. P. M., Yeung, D. Y., & Zhang, N. L. (2000). Hidden-mode Markov Decision Processes for Nonstationary Sequential Decision Making. In Sequence learning (pp. 264–287). Springer Berlin Heidelberg.
Google Scholar
Chun, T. Y., Lee, J. Y., Park, J. B., & Choi, Y. H. (2017, April). Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalised policy iteration. International Journal of Control, 91(6), 1223–1240. https://doi.org/10.1080/00207179.2017.1312669
Web of Science ®Google Scholar
Florian, R. V. (2007). Correct equations for the dynamics of the cart-pole system. Report for the Center for Cognitive and Neural Studies (Coneural).
Google Scholar
Forootani, A., Liuzza, D., Tipaldi, M., & Glielmo, L. (2019, November). Allocating resources via price management systems: a dynamic programming-based approach. International Journal of Control, 1–21. https://doi.org/10.1080/00207179.2019.1694178.
Web of Science ®Google Scholar
Forootani, A., Tipaldi, M., Zarch, M. G., Liuzza, D., & Glielmo, L. (2019, September). Modelling and solving resource allocation problems via a dynamic programming approach. International Journal of Control, 1–12. https://doi.org/10.1080/00207179.2019.1661521.
Web of Science ®Google Scholar
Fujimoto, S., Hoof, H., & Meger, D. (2020). Author's PyTorch implementation of TD3 for OpenAI gym tasks. Retrieved from https://github.com/sfujim/TD3.
Google Scholar
Fujimoto, S., Hoof, H., & Meger, D. (2018, July 10-15). Addressing function approximation error in actor-critic methods. In: J. Dy & A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research (Vol. 80, pp. 1587–1596). PMLR. Retrieved from http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf
Google Scholar
Gábor, Z., Kalmár, Z., & Szepesvári, C. (1998). Multi-criteria reinforcement learning. Proceedings of the Fifteenth International Conference on Machine Learning (Vol. 9, pp. 197–205), Morgan Kaufmann Publishers Inc.
Google Scholar
Giuseppi, A., & Pietrabissa, A. (2020, July). Chance-Constrained control with lexicographic deep reinforcement learning. IEEE Control Systems Letters, 4(3), 755–760. https://doi.org/10.1109/LCSYS.7782633
Google Scholar
Grondman, I., Busoniu, L., Lopes, G. A. D., & Babuska, R. (2012, November). A survey of actor-Critic reinforcement learning: standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
Web of Science ®Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Retrieved from https://arxiv.org/abs/1801.01290.
Google Scholar
Hallak, A., Castro, D. D., & Mannor, S. (2015). Contextual markov decision processes. Retrieved from https://arxiv.org/abs/1502.02259.
Google Scholar
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996, May). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285. https://doi.org/10.1613/jair.301
Web of Science ®Google Scholar
Kelly, S., & M. I. Heywood (2018, September). Emergent solutions to high-Dimensional multitask reinforcement learning. Evolutionary Computation, 26(3), 347–380. https://doi.org/10.1162/evco_a_00232
PubMed Web of Science ®Google Scholar
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., & Silver, D. (2015). Continuous control with deep reinforcement learning. Retrieved from https://arxiv.org/abs/1509.02971.
Google Scholar
Liu, L., & Sukhatme, G. S. (2018, July). A solution to time-varying Markov decision processes. IEEE Robotics and Automation Letters, 3(3), 1631–1638. https://doi.org/10.1109/LRA.2018.2801479
Google Scholar
Liu, C., Xu, X., & Hu, D. (2015, March). Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3), 385–398. https://doi.org/10.1109/TSMC.2014.2358639
Web of Science ®Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., & Wierstra, D. (2013). Playing atari with deep reinforcement learning. Retrieved from https://arxiv.org/abs/1312.5602.
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., & Hassabis, D. (2015, February). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
PubMed Web of Science ®Google Scholar
Omidshafiei, S., Pazis, J., Amato, C., How, J. P., & Vian, J. (2017, August 6-11). Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: D. Precup & Y. W. Teh (Eds.), Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research (Vol. 70, pp. 2681–2690). Retrieved from http://proceedings.mlr.press/v70/omidshafiei17a/omidshafiei17a.pdf
Google Scholar
Padakandla, S., Prabuchandran, K. J., & Bhatnagar, S. (2019). Reinforcement learning in non-stationary environments. Applied Intelligence 2020. Retrieved from arXiv:1905.03970.
Google Scholar
Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. Wiley.
Google Scholar
Rachelson, E., Fabiani, P., & Garcia, F. (2009, November). TiMDPpoly: An improved method for solving time-dependent MDPs. In 2009 21st IEEE International Conference on Tools With Artificial Intelligence. IEEE.
Google Scholar
Shimkin, N. (2011). Learning in complex systems, Lecture Notes. Retrieved from https://shimkin.net.technion.ac.il/courses/learning-in-complex-systems-2011/.
Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction.
Google Scholar
Tanaka, F., & Yamamura, M. (2003). Multitask reinforcement learning on the distribution of MDPs. In Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (cat. no.03ex694). IEEE.
Google Scholar
Teh, Y., Bapst, V., Czarnecki, W. M., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N., & Pascanu, R. (2017). Distral: Robust multitask reinforcement learning. In Advances in neural information processing systems 30 (pp. 4496–4506). Curran Associates, Inc.
Google Scholar
Van Hasselt, H., Guez, A., & Silver, D. (2016, March). Deep reinforcement learning with double Q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). Retrieved from https://arxiv.org/abs/1509.06461
Google Scholar
Van Moffaert, K., & Nowé, A. (2014). Multi-objective reinforcement learning using sets of Pareto dominating policies. The Journal of Machine Learning Research, 15(1), 3483–3512. https://dl.acm.org/doi/10.5555/2627435.2750356
Google Scholar
Verme, M. D., da Silva, B. C., & Baldassarre, G. (2020). Optimal options for multi-task reinforcement learning under time constraints. Retrieved from https://arxiv.org/abs/2001.01620.
Google Scholar
Wang, Y., Chakrabarty, A., Zhou, M., & Zhang, J. (2019). Near-optimal control of motor drives via approximate dynamic programming. In2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) (pp. 3679–3686). Retrieved from https://doi.org/10.1109/SMC.2019.8914595
Google Scholar
Wilson, A., Fern, A., Ray, S., & Tadepalli, P. (2007). Multi-task reinforcement learning: A hierarchical bayesian approach. In Proceedings of the 24th International Conference on Machine Learning ICML (pp. 1015–1022). Association for Computing Machinery.
Google Scholar
Xu, X., Lian, C., Zuo, L., & He, H. (2014). Kernel-based approximate dynamic programming for real-time online learning control: an experimental study. IEEE Transactions on Control Systems Technology, 22(1), 146–156. https://doi.org/10.1109/TCST.2013.2246866
Web of Science ®Google Scholar
Zhang, H., Cui, L., Zhang, X., & Luo, Y. (2011). Data-Driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Transactions on Neural Networks, 22(12), 2226–2236. https://doi.org/10.1109/TNN.2011.2168538
PubMed Web of Science ®Google Scholar
Zhang, H., Wei, Q., & Liu, D. (2011). An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 47(1), 207–214. https://doi.org/10.1016/j.automatica.2010.10.033
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Bellman's principle of optimality and deep reinforcement learning for time-varying tasks

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Bellman's principle of optimality and deep reinforcement learning for time-varying tasks

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date