278
Views
6
CrossRef citations to date
0
Altmetric
Articles

Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control

, , &
Pages 1321-1333 | Received 11 Dec 2018, Accepted 20 Jul 2019, Published online: 11 Aug 2019

References

  • Abouheaf, M., Gueaieb, W., & Sharaf, A. (2018). Model-free adaptive learning control scheme for wind turbines with doubly fed induction generators. IET Renewable Power Generation, 12(14), 1675–1686. doi: 10.1049/iet-rpg.2018.5353
  • Abu-Khalaf, M., & Lewis, F. L. (2005). Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 41(5), 779–791. doi: 10.1016/j.automatica.2004.11.034
  • Al-Tamimi, A., Lewis, F. L., & Abu-Khalaf, M. (2008). Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 38(4), 943–949. doi: 10.1109/TSMCB.2008.926614
  • Beard, R. W., Saridis, G. N., & Wen, J. T. (1997). Galerkin approximations of the generalized Hamilton–Jacobi–Bellman equation. Automatica, 33(12), 2159–2177. doi: 10.1016/S0005-1098(97)00128-3
  • Bellman, R. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.
  • Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K. G., Lewis, F. L., & W. E. Dixon (2013). A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 49(1), 82–92. doi: 10.1016/j.automatica.2012.09.019
  • Feng, T., Zhang, H., Luo, Y., & Zhang, J. (2015). Stability analysis of heuristic dynamic programming algorithm for nonlinear systems. Neurocomputing, 149(Part C), 1461–1468. doi: 10.1016/j.neucom.2014.08.046
  • Finlayson, B. A. (1972). The method of weighted residuals and variational principles: With application in fluid mechanics, heat and mass transfer. New York, NY: Academic Press.
  • Heydari, A. (2014). Revisiting approximate dynamic programming and its convergence. IEEE Transactions on Cybernetics, 44(12), 2733–2743. doi: 10.1109/TCYB.2014.2314612
  • Heydari, A., & Balakrishnan, S. N. (2013). Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Transactions on Neural Networks and Learning Systems, 24(1), 145–157. doi: 10.1109/TNNLS.2012.2227339
  • Heydari, A., & Balakrishnan, S. N. (2014). An adaptive critic-based scheme for consensus control of nonlinear multi-agent systems. International Journal of Control, 87(12), 2463–2474. doi: 10.1080/00207179.2014.928947
  • Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366. doi: 10.1016/0893-6080(89)90020-8
  • Huang, Y., Wang, D., & Liu, D. (2017). Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming. Neurocomputing, 266, 128–140. doi: 10.1016/j.neucom.2017.05.030
  • Ioannou, P. A., & Sun, J. (1996). Robust adaptive control. Upper Saddle River, NJ: PTR Prentice-Hall.
  • Jiang, H., & He, H. (2018). Data-driven distributed output consensus control for partially observable multiagent systems. IEEE Transactions on Cybernetics, 49(3), 848–858. doi: 10.1109/TCYB.2017.2788819
  • Jiang, Y., & Jiang, Z.-P. (2014). Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 882–893. doi: 10.1109/TNNLS.2013.2294968
  • Jiang, Y., & Jiang, Z.-P. (2015). Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Transactions on Automatic Control, 60(11), 2917–2929. doi: 10.1109/TAC.2015.2414811
  • Jiang, Z.-P., & Jiang, Y. (2013). Robust adaptive dynamic programming for linear and nonlinear systems: An overview. European Journal of Control, 19(5), 417–425. doi: 10.1016/j.ejcon.2013.05.017
  • Kiumarsi, B., & Lewis, F. L. (2015). Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 26(1), 140–151. doi: 10.1109/TNNLS.2014.2358227
  • Kiumarsi, B., Lewis, F. L., & Levine, D. S. (2015). Optimal control of nonlinear discrete time-varying systems using a new neural network approximation structure. Neurocomputing, 156, 157–165. doi: 10.1016/j.neucom.2014.12.067
  • Lewis, F. L., Jagannathan, S., & Yesildirek, A. (1999). Neural network control of robot manipulators and nonlinear systems. London: Taylor & Francis.
  • Li, J., Modares, H., Chai, T., Lewis, F. L., & Xie, L. (2017). Off-policy reinforcement learning for synchronization in multiagent graphical games. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2434–2445. doi: 10.1109/TNNLS.2016.2609500
  • Liu, D., Huang, Y., Wang, D., & Wei, Q. (2013a). Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming. International Journal of Control, 86(9), 1554–1566. doi: 10.1080/00207179.2013.790562
  • Liu, D., Wang, D., Wang, F.-Y., Li, H., & Yang, X. (2014). Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems. IEEE Transactions on Cybernetics, 44(12), 2834–2847. doi: 10.1109/TCYB.2014.2357896
  • Liu, D., Yang, X., & Li, H. (2013b). Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics. Neural Computing and Applications, 23(7–8), 1843–1850. doi: 10.1007/s00521-012-1249-y
  • Lopez, V. G., Sanchez, E. N., Alanis, A. Y., & Rios, J. D. (2017). Real-time neural inverse optimal control for a linear induction motor. International Journal of Control, 90(4), 800–812. doi: 10.1080/00207179.2016.1213424
  • Luo, B., Liu, D., Huang, T., & Wang, D. (2016). Model-free optimal tracking control via critic-only Q-learning. IEEE Transactions on Neural Networks and Learning Systems, 27(10), 2134–2144. doi: 10.1109/TNNLS.2016.2585520
  • Luo, B., Wu, H.-N., & Huang, T. (2018). Optimal output regulation for model-free Quanser helicopter with multistep Q-Learning. IEEE Transactions on Industrial Electronics, 65(6), 4953–4961. doi: 10.1109/TIE.2017.2772162
  • Luo, B., Wu, H.-N., Huang, T., & Liu, D. (2014). Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica, 50(12), 3281–3290. doi: 10.1016/j.automatica.2014.10.056
  • Luo, B., Wu, H.-N., Huang, T., & Liu, D. (2015). Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Networks, 71, 150–158. doi: 10.1016/j.neunet.2015.08.007
  • Luy, N. T. (2018). Distributed cooperative H∞ optimal tracking control of MIMO nonlinear multi-agent systems in strict-feedback form via adaptive dynamic programming. International Journal of Control, 91(4), 952–968. doi: 10.1080/00207179.2017.1300685
  • Lv, Y., Na, J., & Ren, X. (2019). Online H∞ control for completely unknown nonlinear systems via an identifier–critic-based ADP structure. International Journal of Control, 92(1), 100–111. doi: 10.1080/00207179.2017.1381763
  • Lv, Y., Na, J., Yang, Q., Wu, X., & Guo, Y. (2016). Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics. International Journal of Control, 89(1), 99–112. doi: 10.1080/00207179.2015.1060362
  • Modares, H., & Lewis, F. L. (2014). Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 50(7), 1780–1792. doi: 10.1016/j.automatica.2014.05.011
  • Modares, H., Lewis, F. L., & Naghibi-Sistani, M.-B. (2014). Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica, 50(1), 193–202. doi: 10.1016/j.automatica.2013.09.043
  • Modares, H., Lewis, F. L., & Naghibi-Sistani, M. B. (2013). Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Transactions on Neural Networks and Learning Systems, 24(10), 1513–1525. doi: 10.1109/TNNLS.2013.2276571
  • Modares, H., Naghibi Sistani, M.-B., & Lewis, F. L. (2013). A policy iteration approach to online optimal control of continuous-time constrained-input systems. ISA Transactions, 52(5), 611–621. doi: 10.1016/j.isatra.2013.04.004
  • Mu, C., Wang, D., & He, H. (2017). Novel iterative neural dynamic programming for data-based approximate optimal control design. Automatica, 81, 240–252. doi: 10.1016/j.automatica.2017.03.022
  • Mu, C., Wang, D., & He, H. (2018). Data-driven finite-horizon approximate optimal control for discrete-time nonlinear systems using iterative HDP approach. IEEE Transactions on Cybernetics, 48(10), 2948–2961. doi: 10.1109/TCYB.2017.2752845
  • Na, J., & Herrmann, G. (2014). Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems. IEEE/CAA Journal of Automatica Sinica, 1(4), 412–422. doi: 10.1109/JAS.2014.7004668
  • Radac, M.-B., Precup, R.-E., & Roman, R.-C. (2018). Data-driven model reference control of MIMO vertical tank systems with model-free VRFT and Q-Learning. ISA Transactions, 73, 227–238. doi: 10.1016/j.isatra.2018.01.014
  • Saridis, G. N., & Lee, C.-S. G. (1979). An approximation theory of optimal control for trainable manipulators. IEEE Transactions on Systems, Man, and Cybernetics, 9(3), 152–159. doi: 10.1109/TSMC.1979.4310171
  • Škach, J., Kiumarsi, B., Lewis, F. L., & Straka, O. (2018). Actor-critic off-policy learning for optimal control of multiple-model discrete-time systems. IEEE Transactions on Cybernetics, 48(1), 29–40. doi: 10.1109/TCYB.2016.2618926
  • Song, R., Lewis, F. L., Wei, Q., & Zhang, H. (2016). Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Transactions on Cybernetics, 46(5), 1041–1050. doi: 10.1109/TCYB.2015.2421338
  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
  • Vamvoudakis, K. G., & Lewis, F. L. (2010). Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 46(5), 878–888. doi: 10.1016/j.automatica.2010.02.018
  • Vamvoudakis, K. G., & Lewis, F. L. (2012). Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. International Journal of Robust and Nonlinear Control, 22(13), 1460–1483. doi: 10.1002/rnc.1760
  • Vamvoudakis, K. G., Vrabie, D., & Lewis, F. L. (2014). Online adaptive algorithm for optimal control with integral reinforcement learning. International Journal of Robust and Nonlinear Control, 24(17), 2686–2710. doi: 10.1002/rnc.3018
  • Vrabie, D., & Lewis, F. (2009). Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 22(3), 237–246. doi: 10.1016/j.neunet.2009.03.008
  • Wang, Z., Behal, A., & Marzocca, P. (2011). Model-free control design for multi-input multi-output aeroelastic system subject to external disturbance. Journal of Guidance, Control, and Dynamics, 34(2), 446–458. doi: 10.2514/1.51403
  • Wang, D., He, H., & Liu, D. (2017a). Adaptive critic nonlinear robust control: A survey. IEEE Transactions on Cybernetics, 47(10), 3429–3451. doi: 10.1109/TCYB.2017.2712188
  • Wang, D., He, H., & Liu, D. (2017b). Improving the critic learning for event-based nonlinear H∞ control design. IEEE Transactions on Cybernetics, 47(10), 3417–3428. doi: 10.1109/TCYB.2017.2653800
  • Wang, D., & Liu, D. (2013). Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique. Neurocomputing, 121, 218–225. doi: 10.1016/j.neucom.2013.04.006
  • Wang, D., Liu, D., Li, H., & Ma, H. (2014). Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Information Sciences, 282, 167–179. doi: 10.1016/j.ins.2014.05.050
  • Wang, D., Liu, D., Wei, Q., Zhao, D., & Jin, N. (2012). Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica, 48(8), 1825–1832. doi: 10.1016/j.automatica.2012.05.049
  • Wang, D., Liu, D., Zhang, Q., & Zhao, D. (2016). Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46(11), 1544–1555. doi: 10.1109/TSMC.2015.2492941
  • Wang, D., Mu, C., Yang, X., & Liu, D. (2017c). Event-based constrained robust control of affine systems incorporating an adaptive critic mechanism. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47(7), 1602–1612. doi: 10.1109/TSMC.2016.2642118
  • Wang, F.-Y., Zhang, H., & Liu, D. (2009). Adaptive dynamic programming: An introduction. IEEE Computational Intelligence Magazine, 4(2), 39–47. doi: 10.1109/MCI.2009.932261
  • Wei, Q., Lewis, F. L., Sun, Q., Yan, P., & Song, R. (2017). Discrete-time deterministic Q-learning: A novel convergence analysis. IEEE Transactions on Cybernetics, 47(5), 1224–1237. doi: 10.1109/TCYB.2016.2542923
  • Wei, Q., & Liu, D. (2014). A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Transactions on Automation Science and Engineering, 11(4), 1176–1190. doi: 10.1109/TASE.2013.2280974
  • Wei, Q., Liu, D., & Lin, H. (2016). Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Transactions on Cybernetics, 46(3), 840–853. doi: 10.1109/TCYB.2015.2492242
  • Wei, Q., Wang, F.-Y., Liu, D., & Yang, X. (2014). Finite-approximation-error-based discrete-time iterative adaptive dynamic programming. IEEE Transactions on Cybernetics, 44(12), 2820–2833. doi: 10.1109/TCYB.2014.2354377
  • Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. (Ph.D. Thesis), Harvard University.
  • Yang, X., Liu, D., & Wang, D. (2014). Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. International Journal of Control, 87(3), 553–566. doi: 10.1080/00207179.2013.848292
  • Zhang, H., Cui, L., & Luo, Y. (2013). Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Transactions on Cybernetics, 43(1), 206–216. doi: 10.1109/TSMCB.2012.2203336
  • Zhao, D., Xia, Z., & Wang, D. (2015). Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Transactions on Automation Science and Engineering, 12(4), 1461–1468. doi: 10.1109/TASE.2014.2348991
  • Zhong, X., He, H., Wang, D., & Ni, Z. (2018). Model-free adaptive control for unknown nonlinear zero-sum differential game. IEEE Transactions on Cybernetics, 48(5), 1633–1646. doi: 10.1109/TCYB.2017.2712617

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.