Abstract
Solving the optimal mean-field control problem usually requires complete system information. In this paper, a Q-learning algorithm is discussed to solve the optimal control problem of the unknown mean-field discrete-time stochastic system. First, through the corresponding transformation, we turn the stochastic mean-field control problem into a deterministic problem. Second, the H matrix is obtained through Q-function, and the control strategy relies only on the H matrix. Therefore, solving H matrix is equivalent to solving the mean-field optimal control. The proposed Q-learning method iteratively solves H matrix and gain matrix according to input system state information, without the need for system parameter knowledge. Next, it is proved that the control matrix sequence obtained by Q-learning converge to the optimal control, which shows theoretical feasibility of the Q-learning. Finally, two simulation cases verify the effectiveness of Q-learning algorithm.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Funding
Notes on contributors
Yingying Ge
Yingying Ge is a doctoral student at Shandong University of Science and Technology. Her current research interests include reinforcement learning, Q-learning, optimal control and adaptive dynamic programming.
Xikui Liu
Xikui Liu received the M.S degree from Shandong University of Science and Technology, and the Ph.D degree from Huazhong University of Science and Technology, China, in 2000 and 2004, respectively. He is a Professor of Shandong University of Science and Technology. His interests include graph theory, linear and nonlinear stochastic optimal control.
Yan Li
Yan Li received M.S and Ph.D degrees from Shandong University of Science and Technology, China, in 2006 and 2015, respectively. She is an Associate Professor of Shandong University of Science and Technology. Her research interests include linear and nonlinear stochastic control.