136
Views
0
CrossRef citations to date
0
Altmetric
Full Papers

Density estimation based soft actor-critic: deep reinforcement learning for static output feedback control with measurement noise

ORCID Icon, ORCID Icon & ORCID Icon
Pages 398-409 | Received 25 Sep 2023, Accepted 16 Jan 2024, Published online: 07 Feb 2024

References

  • Matsuo Y, LeCun Y, Sahani M, et al. Deep learning, reinforcement learning, and world models. Neural Netw. 2022;152:267–275. doi: 10.1016/j.neunet.2022.03.037
  • Sutton RS, Barto AG. Reinforcement learning: an introduction. Cambridge (MA): MIT Press; 2018.
  • Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. Preprint, arXiv:150902971. 2015.
  • Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. International Conference on Machine Learning; 2018; p. 1582–1591.
  • Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. Preprint, arXiv:170706347. 2017.
  • Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. International Conference on Machine Learning; PMLR; 2018; p. 1861–1870.
  • Liu Q, Chung A, Szepesvári C, et al. When is partially observable reinforcement learning not scary?. Conference on Learning Theory; PMLR; 2022; p. 5175–5220.
  • Egorov M, Sunberg ZN, Balaban E, et al. POMDPs.jl: a framework for sequential decision making under uncertainty. J Mach Learn Res. 2017;18(26):1–5. Available from: http://jmlr.org/papers/v18/16-300.html.
  • Sunberg Z, Kochenderfer M. Online algorithms for POMDPs with continuous state, action, and observation spaces. Proceedings of the International Conference on Automated Planning and Scheduling; 2018; Vol. 28. p. 259–263.
  • Takakura S, Sato K. Structured output feedback control for linear quadratic regulator using policy gradient method. IEEE Transactions on Automatic Control; 2023.
  • Neto HC, Trindade MA. Control of drill string torsional vibrations using optimal static output feedback. Control Eng Pract. 2023;130:105366. doi: 10.1016/j.conengprac.2022.105366
  • Chen C, Xie L, Xie K, et al. Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning. Automatica. 2022;146:110581. doi: 10.1016/j.automatica.2022.110581
  • Fatkhullin I, Polyak B. Optimizing static linear feedback: Gradient method. SIAM J Control Optim. 2021;59(5):3887–3911. doi: 10.1137/20M1329858
  • Veseláżş V. Static output feedback controller design. Kybernetika. 2001;37(2):205–221.
  • Zhang H, Chen H, Xiao C, et al. Robust deep reinforcement learning against adversarial perturbations on state observations. Adv Neural Inf Process Syst. 2020;33:21024–21037.
  • Vlassis N, Littman ML, Barber D. On the computational complexity of stochastic controller optimization in POMDPs. ACM Trans Comput Theory (TOCT). 2012;4(4):1–8. doi: 10.1145/2382559.2382563
  • Pishro-Nik H. Introduction to probability, statistics and random processes. Cambridge (MA): Kappa Research, LLC; 2014.
  • Wang Z, Scott DW. Nonparametric density estimation for high-dimensional data–algorithms and applications. Wiley Interdiscip Rev Comput Stat. 2019;11(4):e1461. doi: 10.1002/wics.2019.11.issue-4
  • Chen YC. A tutorial on kernel density estimation and recent advances. Biostat Epidemiol. 2017;1(1):161–187. doi: 10.1080/24709360.2017.1396742
  • Papamakarios G, Pavlakou T, Murray I. Masked autoregressive flow for density estimation. Adv Neural Inf Process Syst. 2017;30:2338–2347.
  • Germain M, Gregor K, Murray I, et al. Made: Masked autoencoder for distribution estimation. International Conference on Machine Learning; PMLR; 2015; p. 881–889.
  • Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning; PMLR; 2015; p. 448–456.
  • Sutton RS, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation. In: Solla S, Leen T, Müller K, editors. Advances in Neural Information Processing Systems; Vol. 12. Cambridge (MA): MIT Press; 1999.
  • Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. International Conference on Machine Learning; PMLR; 2014; p. 387–395.
  • Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications. Preprint, arXiv:181205905. 2018.
  • Gu S, Holly E, Lillicrap TP, et al. Deep reinforcement learning for robotic manipulation. Preprint, arXiv:161000633. 2016;1:1.
  • Rusu AA, Večerík M, Rothörl T, et al. Sim-to-real robot learning from pixels with progressive nets. Conference on Robot Learning; PMLR; 2017; p. 262–270.
  • Andrychowicz OM, Baker B, Chociej M, et al. Learning dexterous in-hand manipulation. Int J Rob Res. 2020;39(1):3–20. doi: 10.1177/0278364919887447
  • Towers M, Terry JK, Kwiatkowski A, et al. Gymnasium. Zenodo; 2023. doi: 10.5281/zenodo.8127026
  • Coumans E, Bai Y. Pybullet, a python module for physics simulation for games, robotics and machine learning [http://pybullet.org]; 2016–2021.
  • Zhong J, Gupta A, Power T. Um-arm-lab/pytorch_kinematics: v0.5.4 ; 2023.
  • Raffin A, Hill A, Gleave A, et al. Stable-baselines3: reliable reinforcement learning implementations. J Mach Learn Res. 2021;22(268):1–8. Available at http://jmlr.org/papers/v22/20-1364.html.
  • Raffin A, Kober J, Stulp F. Smooth exploration for robotic reinforcement learning. Conference on Robot Learning; PMLR; 2022; p. 1634–1644.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.