1,186
Views
2
CrossRef citations to date
0
Altmetric
Articles

Predicting before acting: improving policy quality by taking a vision of consequence

, &
Pages 608-629 | Received 14 Oct 2021, Accepted 30 Dec 2021, Published online: 28 Jan 2022

References

  • Abed-alguni, B. H., & Ottom, M. A. (2018). Double delayed Q-learning. International Journal of Artificial Intelligence, 16(2), 41–59.
  • Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6), 26–38. https://doi.org/10.1109/MSP.2017.2743240
  • Azar, M. G., Piot, B., Pires, B. A., Grill, J. B., Altché, F., & Munos, R. (2019). World discovery models (pp. 1–18). Preprint.
  • Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A. A. (2018). Large-scale study of curiosity-driven learning. In Seventh international conference on learning representations (pp. 1–17). Proceeding of ICML.
  • Ermolov, A., & Sebe, N. (2020). Latent world models for intrinsically motivated exploration. In Neural information processing systems (pp. 5565–5575). MIT Press.
  • Feng, M., & Xu, H. (2017). Deep reinforecement learning based optimal defense for cyber-physical system in presence of unknown cyber-attack. In IEEE symposium series on computational intelligence (SSCI) (pp. 1–8). The Institute of Electrical and Electronics Engineers (IEEE).
  • Frank, M., Leitner, J., Stollenga, M., Förster, A., & Schmidhuber, J. (2014). Curiosity driven reinforcement learning for motion planning on humanoids. Frontiers in Neurorobotics, 7(25), 1–15. https://doi.org/10.3389/fnbot.2013.00025
  • Fujimoto, S., Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. In International conference on machine learning (Vol. 80, pp. 1587–1596). ACM.
  • Gangwani, T., Lehman, J., Liu, Q., & Peng, J. (2019). Learning belief representations for imitation learning in POMDPs. In Conference on uncertainty in artificial intelligence (pp. 1061–1071). AUAI Press.
  • Gidaris, S., Singh, P., & Komodakis, N. (2018). Unsupervised representation learning by predicting image rotations. In International conference on learning representations (pp. 1–16). Proceeding of ICML.
  • Goyal, P., Niekum, S., & Mooney, R. J. (2019). Using natural language for reward shaping in reinforcement learning. In International joint conference on artificial intelligence (pp. 2385–2391). AAAI Press.
  • Gregor, K., Danihelka, I., Rezende, D., & Wierstra, D. (2015). Draw: A recurrent neural network for image generation. In Computer ENCE (pp. 1462–1471). Microtome Publishing.
  • Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on learning representations (pp. 1856–1865). Proceeding of ICML.
  • Haber, N., Mrowca, D., Wang, S., Li, F. F., & Yamins, D. L. K. (2018). Learning to play with intrinsically-motivated self-aware agents. In Neural information processing systems (pp. 8388–8399). MIT Press.
  • Hafner, D., Lillicrap, T., Fischer, L., Villegas, R., Ha, D., Lee, H., & Davidson, J. (2018). Learning latent dynamics for planning from pixels. In International conference on machine learning (pp. 2555–2565). Microtome Publishing.
  • Hasselt, H. V., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, no. 1, pp. 1–7). The Institute of Electrical and Electronics Engineers (IEEE).
  • He, K., Zhang, X. Y., Ren, S. Q., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition (pp. 770–778). The Institute of Electrical and Electronics Engineers (IEEE).
  • Henaff, M. (2019). Explicit explore-exploit algorithms in continuous state spaces. In Conference on neural information processing systems (pp. 9377–9387). MIT Press.
  • Higgins, I., Pal, A., Rusu, A., Matthey, L., Burgess, C., Pritzel, A., Botvinick, M., Blundell, C., & Lerchner, A. (2017). Darla: Improving zero-shot transfer in reinforcement learning. In International conference on machine learning (pp. 1480–1490). Microtome Publishing.
  • Katt, S., Oliehoek, F., & Amato, C. (2019). Bayesian reinforcement learning in factored POMDPs. In International conference on autonomous agents and multiagent systems (pp. 7–15). International Foundation for Autonomous Agents and Multiagent Systems.
  • Ke, N. R., Singh, A., Touati, A., Goyal, A., Bengio, Y., Parikh, D., & Batra, D. (2020). Learning dynamics model in reinforcement learning by incorporating the long term future (1–14). Preprint.
  • Lee, D., Seo, H., & Jung, M. W. (2012). Neural basis of reinforcement learning and decision making. Annual Review of Neuroscience, 35(1), 287–308. https://doi.org/10.1146/neuro.2012.35.issue-1
  • Li, H. B., & Li, Z. S. (2018). A novel strategy of combining variable ordering heuristics for constraint satisfaction problems. IEEE Access, 6(1), 42750–42756. https://doi.org/10.1109/ACCESS.2018.2859618.
  • Liang, Y. J., Xiao, M. Q., Wang, X. F., Tang, X. L., Zhu, H. Z., & Li, J. F. (2019). A POMDP-based optimization method for sequential diagnostic strategy with unreliable tests. IEEE Access, 7(1), 75389–75397. https://doi.org/10.1109/Access.6287639
  • Lillicrap, T. P., Hun, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Daan, W. (2016). Continuous control with deep reinforcement learning. In International conference on learning representations (pp. 1–14). Proceeding of ICML.
  • Lim, M. H., Tomlin, C. J., & Sunberg, Z. N. (2020). Sparse tree search optimality guarantees in POMDPs with continuous observation spaces. In International joint conference on artificial intelligence (pp. 4135–4142). AAAI Press.
  • Marom, O., & Rosman, B. (2018). Belief reward shaping in reinforcement learning. In The association for the advance of artificial intelligence (pp. 3762–3769). AAAI Press.
  • Mnih, V., Badia, A. P., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928–1937). Microtome Publishing.
  • Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. In Neural information processing systems deep learning workshop (pp. 1–9). MIT Press.
  • Moreno, P., Humplik, J., Papamakarios, G., Pires, B. Á., Buesing, L., Heess, N., & Weber, T. (2018). Neural belief states for partially observed domains. In NeurIPS 2018 workshop on reinforcement learning under partial observability (pp. 1–5). MIT Press.
  • Nair, A., Pong, V., Dalal, M., Bahl, S., Lin, S., & Levine, S. (2018). Visual reinforcement learning with imagined goals. In Advances in neural information processing systems (pp. 9191–9200). MIT Press.
  • Ostrovski, G., Bellemare, M. G., Oord, A., & Munos, R. (2017). Count-based exploration with neural density models. In International conference on machine learning (Vol. 70, pp. 2721–2730). Microtome Publishing.
  • Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2018). Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning (pp. 2778–2787). Microtome Publishing.
  • Pathak, D., Gandhi, D., & Gupta, A. (2019). Self-supervised exploration via disagreement. In International conference on machine learning (pp. 5062–5071). Microtome Publishing.
  • Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on machine learning (pp. 1–21). Microtome Publishing.
  • Sekar, R., Rybkin, O., Daniilidis, K., Abbeel, P., Hafner, D., & Pathak, D. (2020). Planning to explore via self-supervised world models. In International conference on machine (pp. 8583–8592). Microtome Publishing.
  • Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In International conference on machine learning (pp. 387–395). Microtome Publishing.
  • Wang, R. W., Xia, W., Yap, R. H., & Li, Z. H. (2016). Optimizing simple tabular reduction with a bitwise representation. In International joint conference on artificial intelligence (pp. 787–793). AAAI Press.
  • Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (pp. 1995–2003). Microtome Publishing.
  • Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292. https://doi.org/10.1007/BF00992698
  • Xu, X., Zuo, L., & Huang, Z. H. (2014). Reinforcement learning algorithms with function approximation: Recent advances and applications. Information Sciences, 261(5), 1–31. https://doi.org/10.1016/j.ins.2013.08.037.
  • Zhang, Z., Zhang, Y., Liu, G. C., Tang, J. H., Yan, S. C., & Wang, M. (2020). Joint label prediction based semi-supervised adaptive concept factorization for robust data representation. IEEE Transactions on Knowledge and Data Engineering, 32(5), 952–970. https://doi.org/10.1109/TKDE.69
  • Zhang, Z. Z., Hsu, D., & Lee, W. S. (2014). Covering number for efficient heuristic-based POMDP planning. In International conference on machine learning (pp. 28–36). Microtome Publishing.
  • Zhang, Z. Z., Hsu, D., Lee, W. S., Lim, Z. W., & Bai, A. (2015). Please: Palm leaf search for POMDPs with large observation spaces. In International conference on international conference on automated planning and scheduling (pp. 249–257). AAAI Press.