References
- Matsuo Y, LeCun Y, Sahani M, et al. Deep learning, reinforcement learning, and world models. Neural Netw. 2022;152:267–275. doi: 10.1016/j.neunet.2022.03.037
- Sutton RS, Barto AG. Reinforcement learning: an introduction. Cambridge (MA): MIT Press; 2018.
- Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. Preprint, arXiv:150902971. 2015.
- Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. International Conference on Machine Learning; 2018; p. 1582–1591.
- Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. Preprint, arXiv:170706347. 2017.
- Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. International Conference on Machine Learning; PMLR; 2018; p. 1861–1870.
- Liu Q, Chung A, Szepesvári C, et al. When is partially observable reinforcement learning not scary?. Conference on Learning Theory; PMLR; 2022; p. 5175–5220.
- Egorov M, Sunberg ZN, Balaban E, et al. POMDPs.jl: a framework for sequential decision making under uncertainty. J Mach Learn Res. 2017;18(26):1–5. Available from: http://jmlr.org/papers/v18/16-300.html.
- Sunberg Z, Kochenderfer M. Online algorithms for POMDPs with continuous state, action, and observation spaces. Proceedings of the International Conference on Automated Planning and Scheduling; 2018; Vol. 28. p. 259–263.
- Takakura S, Sato K. Structured output feedback control for linear quadratic regulator using policy gradient method. IEEE Transactions on Automatic Control; 2023.
- Neto HC, Trindade MA. Control of drill string torsional vibrations using optimal static output feedback. Control Eng Pract. 2023;130:105366. doi: 10.1016/j.conengprac.2022.105366
- Chen C, Xie L, Xie K, et al. Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning. Automatica. 2022;146:110581. doi: 10.1016/j.automatica.2022.110581
- Fatkhullin I, Polyak B. Optimizing static linear feedback: Gradient method. SIAM J Control Optim. 2021;59(5):3887–3911. doi: 10.1137/20M1329858
- Veseláżş V. Static output feedback controller design. Kybernetika. 2001;37(2):205–221.
- Zhang H, Chen H, Xiao C, et al. Robust deep reinforcement learning against adversarial perturbations on state observations. Adv Neural Inf Process Syst. 2020;33:21024–21037.
- Vlassis N, Littman ML, Barber D. On the computational complexity of stochastic controller optimization in POMDPs. ACM Trans Comput Theory (TOCT). 2012;4(4):1–8. doi: 10.1145/2382559.2382563
- Pishro-Nik H. Introduction to probability, statistics and random processes. Cambridge (MA): Kappa Research, LLC; 2014.
- Wang Z, Scott DW. Nonparametric density estimation for high-dimensional data–algorithms and applications. Wiley Interdiscip Rev Comput Stat. 2019;11(4):e1461. doi: 10.1002/wics.2019.11.issue-4
- Chen YC. A tutorial on kernel density estimation and recent advances. Biostat Epidemiol. 2017;1(1):161–187. doi: 10.1080/24709360.2017.1396742
- Papamakarios G, Pavlakou T, Murray I. Masked autoregressive flow for density estimation. Adv Neural Inf Process Syst. 2017;30:2338–2347.
- Germain M, Gregor K, Murray I, et al. Made: Masked autoencoder for distribution estimation. International Conference on Machine Learning; PMLR; 2015; p. 881–889.
- Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning; PMLR; 2015; p. 448–456.
- Sutton RS, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation. In: Solla S, Leen T, Müller K, editors. Advances in Neural Information Processing Systems; Vol. 12. Cambridge (MA): MIT Press; 1999.
- Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. International Conference on Machine Learning; PMLR; 2014; p. 387–395.
- Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications. Preprint, arXiv:181205905. 2018.
- Gu S, Holly E, Lillicrap TP, et al. Deep reinforcement learning for robotic manipulation. Preprint, arXiv:161000633. 2016;1:1.
- Rusu AA, Večerík M, Rothörl T, et al. Sim-to-real robot learning from pixels with progressive nets. Conference on Robot Learning; PMLR; 2017; p. 262–270.
- Andrychowicz OM, Baker B, Chociej M, et al. Learning dexterous in-hand manipulation. Int J Rob Res. 2020;39(1):3–20. doi: 10.1177/0278364919887447
- Towers M, Terry JK, Kwiatkowski A, et al. Gymnasium. Zenodo; 2023. doi: 10.5281/zenodo.8127026
- Coumans E, Bai Y. Pybullet, a python module for physics simulation for games, robotics and machine learning [http://pybullet.org]; 2016–2021.
- Zhong J, Gupta A, Power T. Um-arm-lab/pytorch_kinematics: v0.5.4 ; 2023.
- Raffin A, Hill A, Gleave A, et al. Stable-baselines3: reliable reinforcement learning implementations. J Mach Learn Res. 2021;22(268):1–8. Available at http://jmlr.org/papers/v22/20-1364.html.
- Raffin A, Kober J, Stulp F. Smooth exploration for robotic reinforcement learning. Conference on Robot Learning; PMLR; 2022; p. 1634–1644.