136
Views
0
CrossRef citations to date
0
Altmetric
Full Papers

Density estimation based soft actor-critic: deep reinforcement learning for static output feedback control with measurement noise

ORCID Icon, ORCID Icon & ORCID Icon
Pages 398-409 | Received 25 Sep 2023, Accepted 16 Jan 2024, Published online: 07 Feb 2024
 

ABSTRACT

The state-of-the-art deep reinforcement learning (DRL) methods, including Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), among others, demonstrate significant capability in solving the optimal static state feedback control (SSFC) problem. This problem can be modeled as a fully observed Markov decision process (MDP). However, the optimal static output feedback control (SOFC) problem with measurement noise is a typical partially observable MDP (POMDP), which is difficult to solve, especially for the continuous state-action-observation space with high dimensions. This paper proposes a two-stage framework to address this challenge. In the laboratory stage, both the states and the noisy outputs are observable; the SOFC policy is converted to a constrained stochastic SSFC policy, of which the probability density function is generally not analytical. To this end, a density estimation based SAC algorithm is proposed to explore the optimal SOFC policy by learning the optimal constrained stochastic SSFC. Consequently, in the real-world stage, only the noisy outputs and the learned SOFC policy are required to solve the optimal SOFC problem. Numerical simulations and the corresponding experiments with robotic arms are provided to illustrate the effectiveness of our method. The code is available at https://github.com/RanKyoto/DE-SAC.

GRAPHICAL ABSTRACT

Disclosure statement

No potential conflict of interest was reported by theauthor(s).

Notes

2 dlqe() and dlqr() are the solvers of a linear-quadratic state estimator (LQE) and a linear-quadratic regulator (LQR) for discrete-time linear control systems, respectively. They can be found in Matlab or Python Control Systems Library.

Additional information

Funding

This work was supported in part by JSPS KAKENHI under Grant Number JP21H04875.

Notes on contributors

Ran Wang

Ran Wang received the B.E. degree in Automation and the M.E. degree in Control Theory and Control Engineering from Dalian University of Technology, Dalian, China, in 2016 and 2019, respectively. He is currently pursuing the Ph.D. degree with Kyoto University, Kyoto, Japan. His research interests include reinforcement learning and self-triggered control.

Ye Tian

Ye Tian received his B.S. degree in applied mathematics from Ningxia University, Yinchuan, China, in 2014, and the Ph.D. degree in control theory and control engineering from Xidian University, Xi'an, China, in 2021. From 2017 to 2019, he was a visiting student at the Center for Control, Dynamical-systems and Computation, University of California, Santa Barbara, CA, USA. He is currently a Postdoctoral Fellow with the Graduate School of Informatics, Kyoto University, Kyoto, Japan. His research interests include multi-agent systems, game theory, and social networks.

Kenji Kashima

Kenji Kashima received his Doctoral degree in Informatics from Kyoto University in 2005. He was with Tokyo Institute of Technology, Universität Stuttgart, Osaka University, before he joined Kyoto University in 2013, where he is currently an Associate Professor. His research interests include control and learning theory for complex (large scale, stochastic, networked) dynamical systems, as well as its interdisciplinary applications. He received Humboldt Research Fellowship (Germany), IEEE CSS Roberto Tempo Best CDC Paper Award, Pioneer Award of SICE Control Division, and so on. He is an Associate Editor of IEEE Transactions of Automatic Control ( 2017–), the IEEE CSS Conference Editorial Board ( 2011–) and Asian Journal of Control ( 2014–).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 332.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.