392
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Bellman's principle of optimality and deep reinforcement learning for time-varying tasks

ORCID Icon & ORCID Icon
Pages 2448-2459 | Received 04 Oct 2020, Accepted 01 Apr 2021, Published online: 16 Apr 2021

References

  • Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983, September). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics SMC, 13(5), 834–846. https://doi.org/10.1109/TSMC.1983.6313077
  • Bellman, R. (1966, July). Dynamic programming. Science (New York, N.Y.), 153(3731), 34–37. https://doi.org/10.1126/science.153.3731.34
  • Bertsekas, D. P. (2005). Dynamic programming and optimal control (3rd ed., Vol. I). Athena Scientific.
  • Borsa, D., Graepel, T., & Shawe-Taylor, J. (2016). Learning shared representations in multi-task reinforcement learning. Retrieved from http://arxiv.org/abs/1603.02041.
  • Boyan, J. A., & Littman, M. L. (2001). Exact solutions to time-dependent MDPs. In Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, CO (pp. 982–988).MIT Press.
  • Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., & Tang, J. (2016). Openai gym. Retrieved from https://arxiv.org/abs/1606.01540.
  • Choi, S. P., Yeung, D. Y., & Zhang, N. L. (2000). An environment model for nonstationary reinforcement learning. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, CO (pp. 987–993). MIT Press.
  • Choi, S. P. M., Yeung, D. Y., & Zhang, N. L. (2000). Hidden-mode Markov Decision Processes for Nonstationary Sequential Decision Making. In Sequence learning (pp. 264–287). Springer Berlin Heidelberg.
  • Chun, T. Y., Lee, J. Y., Park, J. B., & Choi, Y. H. (2017, April). Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalised policy iteration. International Journal of Control, 91(6), 1223–1240. https://doi.org/10.1080/00207179.2017.1312669
  • Florian, R. V. (2007). Correct equations for the dynamics of the cart-pole system. Report for the Center for Cognitive and Neural Studies (Coneural).
  • Forootani, A., Liuzza, D., Tipaldi, M., & Glielmo, L. (2019, November). Allocating resources via price management systems: a dynamic programming-based approach. International Journal of Control, 1–21. https://doi.org/10.1080/00207179.2019.1694178.
  • Forootani, A., Tipaldi, M., Zarch, M. G., Liuzza, D., & Glielmo, L. (2019, September). Modelling and solving resource allocation problems via a dynamic programming approach. International Journal of Control, 1–12. https://doi.org/10.1080/00207179.2019.1661521.
  • Fujimoto, S., Hoof, H., & Meger, D. (2020). Author's PyTorch implementation of TD3 for OpenAI gym tasks. Retrieved from https://github.com/sfujim/TD3.
  • Fujimoto, S., Hoof, H., & Meger, D. (2018, July 10-15). Addressing function approximation error in actor-critic methods. In: J. Dy & A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research  (Vol. 80, pp. 1587–1596). PMLR. Retrieved from http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf
  • Gábor, Z., Kalmár, Z., & Szepesvári, C. (1998). Multi-criteria reinforcement learning. Proceedings of the Fifteenth International Conference on Machine Learning (Vol. 9, pp. 197–205), Morgan Kaufmann Publishers Inc.
  • Giuseppi, A., & Pietrabissa, A. (2020, July). Chance-Constrained control with lexicographic deep reinforcement learning. IEEE Control Systems Letters, 4(3), 755–760. https://doi.org/10.1109/LCSYS.7782633
  • Grondman, I., Busoniu, L., Lopes, G. A. D., & Babuska, R. (2012, November). A survey of actor-Critic reinforcement learning: standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
  • Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Retrieved from https://arxiv.org/abs/1801.01290.
  • Hallak, A., Castro, D. D., & Mannor, S. (2015). Contextual markov decision processes. Retrieved from https://arxiv.org/abs/1502.02259.
  • Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996, May). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285. https://doi.org/10.1613/jair.301
  • Kelly, S., & M. I. Heywood (2018, September). Emergent solutions to high-Dimensional multitask reinforcement learning. Evolutionary Computation, 26(3), 347–380. https://doi.org/10.1162/evco_a_00232
  • Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., & Silver, D. (2015). Continuous control with deep reinforcement learning. Retrieved from https://arxiv.org/abs/1509.02971.
  • Liu, L., & Sukhatme, G. S. (2018, July). A solution to time-varying Markov decision processes. IEEE Robotics and Automation Letters, 3(3), 1631–1638. https://doi.org/10.1109/LRA.2018.2801479
  • Liu, C., Xu, X., & Hu, D. (2015, March). Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3), 385–398. https://doi.org/10.1109/TSMC.2014.2358639
  • Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., & Wierstra, D. (2013). Playing atari with deep reinforcement learning. Retrieved from https://arxiv.org/abs/1312.5602.
  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., & Hassabis, D. (2015, February). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
  • Omidshafiei, S., Pazis, J., Amato, C., How, J. P., & Vian, J. (2017, August 6-11). Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: D. Precup & Y. W. Teh (Eds.), Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research (Vol. 70, pp. 2681–2690). Retrieved from http://proceedings.mlr.press/v70/omidshafiei17a/omidshafiei17a.pdf
  • Padakandla, S., Prabuchandran, K. J., & Bhatnagar, S. (2019). Reinforcement learning in non-stationary environments. Applied Intelligence 2020. Retrieved from arXiv:1905.03970.
  • Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. Wiley.
  • Rachelson, E., Fabiani, P., & Garcia, F. (2009, November). TiMDPpoly: An improved method for solving time-dependent MDPs. In 2009 21st IEEE International Conference on Tools With Artificial Intelligence. IEEE.
  • Shimkin, N. (2011). Learning in complex systems, Lecture Notes. Retrieved from https://shimkin.net.technion.ac.il/courses/learning-in-complex-systems-2011/.
  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction.
  • Tanaka, F., & Yamamura, M. (2003). Multitask reinforcement learning on the distribution of MDPs. In Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (cat. no.03ex694). IEEE.
  • Teh, Y., Bapst, V., Czarnecki, W. M., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N., & Pascanu, R. (2017). Distral: Robust multitask reinforcement learning. In Advances in neural information processing systems 30 (pp. 4496–4506). Curran Associates, Inc.
  • Van Hasselt, H., Guez, A., & Silver, D. (2016, March). Deep reinforcement learning with double Q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). Retrieved from https://arxiv.org/abs/1509.06461
  • Van Moffaert, K., & Nowé, A. (2014). Multi-objective reinforcement learning using sets of Pareto dominating policies. The Journal of Machine Learning Research, 15(1), 3483–3512. https://dl.acm.org/doi/10.5555/2627435.2750356
  • Verme, M. D., da Silva, B. C., & Baldassarre, G. (2020). Optimal options for multi-task reinforcement learning under time constraints. Retrieved from https://arxiv.org/abs/2001.01620.
  • Wang, Y., Chakrabarty, A., Zhou, M., & Zhang, J. (2019). Near-optimal control of motor drives via approximate dynamic programming. In2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) (pp. 3679–3686). Retrieved from https://doi.org/10.1109/SMC.2019.8914595
  • Wilson, A., Fern, A., Ray, S., & Tadepalli, P. (2007). Multi-task reinforcement learning: A hierarchical bayesian approach. In Proceedings of the 24th International Conference on Machine Learning ICML (pp. 1015–1022). Association for Computing Machinery.
  • Xu, X., Lian, C., Zuo, L., & He, H. (2014). Kernel-based approximate dynamic programming for real-time online learning control: an experimental study. IEEE Transactions on Control Systems Technology, 22(1), 146–156. https://doi.org/10.1109/TCST.2013.2246866
  • Zhang, H., Cui, L., Zhang, X., & Luo, Y. (2011). Data-Driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Transactions on Neural Networks, 22(12), 2226–2236. https://doi.org/10.1109/TNN.2011.2168538
  • Zhang, H., Wei, Q., & Liu, D. (2011). An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 47(1), 207–214. https://doi.org/10.1016/j.automatica.2010.10.033

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.