1,555
Views
1
CrossRef citations to date
0
Altmetric
Research Article

A Novel Augmentative Backward Reward Function with Deep Reinforcement Learning for Autonomous UAV Navigation

& ORCID Icon
Article: 2084473 | Received 22 Feb 2022, Accepted 25 May 2022, Published online: 06 Jul 2022

References

  • A, M., R. Hado P van Hasselt, and R. S. Sutton. 2014. Weighted importance sampling for off-policy learning with linear function approximation. Neural Information Processing Systems. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2014/hash/be53ee61104935234b174e62a07e53cf-Abstract.html
  • Bellman, R. 1957. A markovian decision process. Indiana University Mathematics Journal 6 (4):679–2339. doi:10.1512/iumj.1957.6.56038.
  • Erke, S., D. Bin, N. Yiming, Q. Zhu, X. Liang, and Z. Dawei. 2020. An improved a-star based path planning algorithm for autonomous land vehicles. International Journal of Advanced Robotic Systems 17 (5):172988142096226. doi:10.1177/1729881420962263.
  • Falanga, D., K. Kleber, and D. Scaramuzza. 2020. Dynamic obstacle avoidance for quadrotors with event cameras. Science Robotics 5 (40):eaaz9712. doi:10.1126/scirobotics.aaz9712.
  • Fujimoto, S., H. van, and D. Meger. 2018. Addressing function approximation error in actor-critic methods. ArXiv organization. https://arxiv.org/abs/1802.09477
  • Hajdu, C., and Á. Ballagi. 2020. Proposal of a graph-based motion planner architecture. 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) Online on MaxWhere 3D Web. September 1, 2020. doi:10.1109/CogInfoCom50765.2020.9237891.
  • Hou, Y., and Y. Zhang. 2019. Improving DDPG via prioritized experience replay. https://cardwing.github.io/files/RL_course_report.pdf
  • Kam, H. R., S.-H. Lee, T. Park, and C.-H. Kim. 2015. RViz: A toolkit for real domain data visualization. Telecommunication Systems 60 (2):337–45. doi:10.1007/s11235-015-0034-5.
  • Kapturowski, S., G. Ostrovski, J. Quan, R. Munos, and W. Dabney. 2018. Recurrent experience replay in distributed reinforcement learning. Openreview.net. September 27, 2018. https://openreview.net/forum?id=r1lyTjAqYX
  • Koenig, N., and A. Howard. 2004 Design and use paradigms for Gazebo, an open-source multi-robot simulator. 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE CatNo.04CH37566) Sendai, Japan. Accessed October 20, 2019. doi:10.1109/iros.2004.1389727.
  • Kolar, P., P. Benavidez, and M. Jamshidi. 2020. Survey of datafusion techniques for laser and vision based sensor integration for autonomous navigation. Sensors 20 (8):2180. doi:10.3390/s20082180.
  • Konda, V. R., and J. N. Tsitsiklis. 2003. OnActor-critic algorithms. SIAM Journal on Control and Optimization 42 (4):1143–66. doi:10.1137/s0363012901385691.
  • Lillicrap, T. P., J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. 2015. Continuous control with deep reinforcement learning. ArXiv.org. https://arxiv.org/abs/1509.02971.
  • MAHMUD, A. L. 2021. Development of collision resilient drone for flying in cluttered environment. Graduate Theses, Dissertations, and Problem Reports (January). doi:10.33915/etd.8022.
  • Mnih, V., K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing atari with deep reinforcement learning. ArXiv.org. https://arxiv.org/abs/1312.5602.
  • Nitta, K., K. Higuchi, Y. Tadokoro, and J. Rekimoto. 2015. Shepherd pass. Proceedings of the 12th International Conference on Advances in Computer Entertainment Technology Iskandar, Malaysia, November. doi:10.1145/2832932.2832950.
  • Schaul, T., J. Quan, I. Antonoglou, and D. Silver. 2015. Prioritized experience replay. ArXiv organization. https://arxiv.org/abs/1511.05952.
  • Silver, D., G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. 2014. Deterministic policy gradient algorithms. Proceedings.mlr.press. PMLR Bejing, China. January 27, 2014. https://proceedings.mlr.press/v32/silver14.html.
  • Stanford Artificial Intelligence Laboratory et al.2018. Robotic operating system (version ROS Melodic Morenia) https://www.ros.org.
  • Sutton, R., and B. En A. G 1999 . Reinforcement learning. Journal of Cognitive Neuroscience 11 01 1999:126–34
  • Tan, F., P. Yan, and X. Guan. 2017. Deep reinforcement learning: from Q-learning to deep Q-learning. Neural Information Processing 475–83. doi:10.1007/978-3-319-70093-9_50.
  • Youn, W., K. Hayoon, H. Choi, I. Choi, J.-H. Baek, and H. Myung. 2020. Collision-free autonomous navigation of a small UAV using low-cost sensors in GPS-denied environments. International Journal of Control, Automation, and Systems 19 (2):953–68. doi:10.1007/s12555-019-0797-7.
  • Zhang, L., S. Han, Z. Zhang, L. Lefan, and S. Lü. 2020a. Deep recurrent deterministic policy gradient for physical control. Artificial Neural Networks and Machine Learning –ICANN 2020:257–68. doi:10.1007/978-3-030-61616-8_21.
  • Zhang, Q., M. Zhu, L. Zou, L. Ming, and Y. Zhang. 2020b. Learning reward function with matching network for mapless navigation. Sensors 20 (13):3664. doi:10.3390/s20133664.
  • Zijian, H., K. Wan, X. Gao, Y. Zhai, and Q. Wang. 2020. Deep reinforcement learning approach with multiple experience pools for UAV’s autonomous motion planning in complex unknown environments. Sensors 20 (7):1890. doi:10.3390/s20071890.