976
Views
0
CrossRef citations to date
0
Altmetric
Articles

Safe reinforcement learning for dynamical systems using barrier certificates

ORCID Icon, ORCID Icon &
Pages 2822-2844 | Received 29 May 2022, Accepted 19 Nov 2022, Published online: 12 Dec 2022

References

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., & Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI}16) (pp. 265–283).
  • Akametalu, A. K., Fisac, J. F., Gillula, J. H., Kaynama, S., Zeilinger, M. N., & C. J. Tomlin (2014). Reachability-based safe learning with Gaussian processes. In 53rd IEEE conference on decision and control (pp. 1424–1431).
  • Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., & Topcu, U. (2018). Safe reinforcement learning via shielding. In Thirty-second AAAI conference on artificial intelligence.
  • Anderson, G., Verma, A., Dillig, I., & Chaudhuri, S. (2020). Neurosymbolic reinforcement learning with formally verified exploration. Advances in Neural Information Processing Systems, 33, 6172–6183. https://dl.acm.org/doi/abs/10.5555/3495724.3496242
  • Bastani, O. (2019). Safe reinforcement learning via online shielding. Preprint arXiv:1905.10691, 288, 289.
  • Berkenkamp, F., Turchetta, M., Schoellig, A. P., & Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. Preprint arXiv:1705.08551.
  • Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. Preprint arXiv:1606.01540.
  • Cheng, R., Orosz, G., Murray, R. M., & Burdick, J. W. (2019). End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 3387–3395).
  • Clavière, A., Asselin, E., Garion, C., & Pagetti, C. (2020). Safety verification of neural network controlled systems. Preprint arXiv:2011.05174.
  • Cohen, M. H., & Belta, C. (2020). Approximate optimal control for safety-critical systems with control barrier functions. In 2020 59th IEEE conference on decision and control (CDC) (pp. 2062–2067).
  • De Moura, L., & Bjørner, N. (2008). Z3: An efficient SMT solver. In International conference on tools and algorithms for the construction and analysis of systems (pp. 337–340).
  • Deshmukh, J. V., Kapinski, J. P., Yamaguchi, T., & Prokhorov, D. (2019). Learning deep neural network controllers for dynamical systems with safety guarantees. In 2019 IEEE/ACM international conference on computer-aided design (ICCAD) (pp. 1–7).
  • Dong, K., Luo, Y., Yu, T., Finn, C., & Ma, T. (2020). On the expressivity of neural networks for deep reinforcement learning. In International conference on machine learning (pp. 2627–2637).
  • Dutta, S., Jha, S., Sankaranarayanan, S., & Tiwari, A. (2018a). Learning and verification of feedback control systems using feedforward neural networks. IFAC-PapersOnLine, 51(16), 151–156. https://doi.org/10.1016/j.ifacol.2018.08.026
  • Dutta, S., Jha, S., Sankaranarayanan, S., & Tiwari, A. (2018b). Output range analysis for deep feedforward neural networks. In Nasa formal methods symposium (pp. 121–138).
  • Fulton, N., & Platzer, A. (2018). Safe reinforcement learning via formal methods: Toward safe control through proof and learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32).
  • Gao, S., Kong, S., & Clarke, E. M. (2013). dReal: An SMT solver for nonlinear theories over the reals. In International conference on automated deduction (pp. 208–214).
  • Garcıa, J., & Fernández, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1), 1437–1480. https://dl.acm.org/doi/10.5555/2789272.2886795
  • Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Yang, Y., & Knoll, A. (2022). A review of safe reinforcement learning: Methods, theory and applications. Preprint arXiv:2205.10330.
  • Gurobi Optimization, I. (2022). Gurobi optimizer reference manual. Retrieved from https://www.gurobi.com
  • Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic algorithms and applications. Preprint arXiv:1812.05905.
  • Hu, H., Fazlyab, M., Morari, M., & Pappas, G. J. (2020). Reach-sdp: Reachability analysis of closed-loop systems with neural network controllers via semidefinite programming. In 2020 59th IEEE conference on decision and control (CDC) (pp. 5929–5934).
  • Huang, C., Fan, J., Li, W., Chen, X., & Zhu, Q. (2019). Reachnn: Reachability analysis of neural-network controlled systems. ACM Transactions on Embedded Computing Systems (TECS), 18(5s), 1–22. https://doi.org/10.1145/3358228
  • Ivanov, R., Carpenter, T. J., Weimer, J., Alur, R., Pappas, G. J., & Lee, I. (2020). Verifying the safety of autonomous systems with neural network controllers. ACM Transactions on Embedded Computing Systems (TECS), 20(1), 1–26. https://doi.org/10.1145/3419742
  • Jin, W., Wang, Z., Yang, Z., & Mou, S. (2020). Neural certificates for safe control policies. Preprint arXiv:2006.08465.
  • Julian, K. D., & Kochenderfer, M. J. (2019). A reachability method for verifying dynamical systems with deep neural network controllers. Preprint arXiv:1903.00520.
  • Katz, G., Barrett, C., Dill, D. L., Julian, K., & Kochenderfer, M. J. (2017). Reluplex: An efficient SMT solver for verifying deep neural networks. In International conference on computer aided verification (pp. 97–117).
  • Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In ICLR (poster).
  • Lin, L., Gong, S., Li, T., & Peeta, S. (2018). Deep learning-based human-driven vehicle trajectory prediction and its application for platoon control of connected and autonomous vehicles. In The autonomous vehicles symposium (Vol. 2018).
  • Luo, Y., & Ma, T. (2021). Learning barrier certificates: Towards safe reinforcement learning with zero training-time violations. In Thirty-fifth conference on neural information processing systems.
  • Ma, Y. J., Shen, A., Bastani, O., & Dinesh, J. (2022). Conservative and adaptive penalty for model-based safe reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, pp. 5404–5412).
  • Ohnishi, M., Wang, L., Notomista, G., & Egerstedt, M. (2019). Barrier-certified adaptive reinforcement learning with applications to brushbot navigation. IEEE Transactions on Robotics, 35(5), 1186–1205. https://doi.org/10.1109/TRO.8860
  • Peruffo, A., Ahmed, D., & Abate, A. (2021). Automated and formal synthesis of neural barrier certificates for dynamical models. In International conference on tools and algorithms for the construction and analysis of systems (pp. 370–388).
  • Prajna, S., & Jadbabaie, A. (2004). Safety verification of hybrid systems using barrier certificates. In Proceedings of the 7th international workshop on hybrid systems: Computation and control (HSCC) (pp. 477–492). Springer.
  • Radac, M. B., & Precup, R. E. (2019). Data-driven model-free tracking reinforcement learning control with VRFT-based adaptive actor-critic. Applied Sciences, 9(9), 1807. https://doi.org/10.3390/app9091807
  • Ravanbakhsh, H., & Sankaranarayanan, S. (2019). Learning control Lyapunov functions from counterexamples and demonstrations. Autonomous Robots, 43(2), 275–307. https://doi.org/10.1007/s10514-018-9791-9
  • Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In International conference on machine learning (pp. 387–395).
  • Singh, G., Gehr, T., Püschel, M., & Vechev, M. (2019). An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL), 1–30. https://doi.org/10.1145/3290354
  • Srinivasan, K., Eysenbach, B., Ha, S., Tan, J., & Finn, C. (2020). Learning to be safe: Deep rl with a safety critic. Preprint arXiv:2010.14603.
  • Stooke, A., Achiam, J., & Abbeel, P. (2020). Responsive safety in reinforcement learning by PID Lagrangian methods. In International conference on machine learning (pp. 9133–9143).
  • Sun, X., Khedr, H., & Shoukry, Y. (2019). Formal verification of neural network controlled autonomous systems. In Proceedings of the 22nd ACM international conference on hybrid systems: Computation and control (pp. 147–156).
  • Sutton, R. S., Szepesvári, C., & Maei, H. R. (2008). A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation. In Nips.
  • Tamar, A., Chow, Y., Ghavamzadeh, M., & Mannor, S. (2016). Sequential decision making with coherent risk. IEEE Transactions on Automatic Control, 62(7), 3323–3338. https://doi.org/10.1109/TAC.2016.2644871
  • Taylor, A., Singletary, A., Yue, Y., & Ames, A. (2020). Learning for safety-critical control with control barrier functions. In Learning for dynamics and control (pp. 708–717).
  • Thananjeyan, B., Balakrishna, A., Nair, S., Luo, M., Srinivasan, K., Hwang, M., Gonzalez, J. E., Ibarz, J., Finn, C., & Goldberg, K. (2021). Recovery rl: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6(3), 4915–4922. https://doi.org/10.1109/LRA.2021.3070252
  • Thomas, P. S. (Doctoral Dissertations, 2015). Safe reinforcement learning. https://doi.org/10.7275/7529913.0
  • Wang, S., Pei, K., Whitehouse, J., Yang, J., & Jana, S. (2018). Formal security analysis of neural networks using symbolic intervals. In 27th {USENIX} security symposium ({USENIX} security 18) (pp. 1599–1614).
  • Wang, T., Qin, R., Chen, Y., Snoussi, H., & Choi, C. (2019). A reinforcement learning approach for UAV target searching and tracking. Multimedia Tools and Applications, 78(4), 4347–4364. https://doi.org/10.1007/s11042-018-5739-5
  • Wei, T., Wang, Y., & Zhu, Q. (2017). Deep reinforcement learning for building HVAC control. In Proceedings of the 54th annual design automation conference 2017 (pp. 1–6).
  • Winterer, F. (2017). iSAT3. Retrieved from https://projects.informatik.uni-freiburg.de/projects/isat3
  • Wong, E., & Kolter, Z. (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. In International conference on machine learning (pp. 5286–5295).
  • Xiang, W., Tran, H. D., Yang, X., & Johnson, T. T. (2020). Reachable set estimation for neural network control systems: A simulation-guided approach. IEEE Transactions on Neural Networks and Learning Systems, 32(5), 1821–1830. https://doi.org/10.1109/TNNLS.2020.2991090
  • Xu, J., Hou, Z., Wang, W., Xu, B., Zhang, K., & Chen, K. (2018). Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks. IEEE Transactions on Industrial Informatics, 15(3), 1658–1667. https://doi.org/10.1109/TII.2018.2868859
  • Yang, T. Y., Rosca, J., Narasimhan, K., & Ramadge, P. J. (2020). Accelerating safe reinforcement learning with constraint-mismatched policies. Preprint arXiv:2006.11645.
  • Yang, Y., Ding, D. W., Xiong, H., Yin, Y., & Wunsch, D. C. (2020). Online barrier-actor-critic learning for H∞ control with full-state constraints and input saturation. Journal of the Franklin Institute, 357(6), 3316–3344. https://doi.org/10.1016/j.jfranklin.2019.12.017
  • Yang, Y., Gao, W., Modares, H., & Xu, C. Z. (2021). Robust actor-critic learning for continuous-time nonlinear systems with unmodeled dynamics. IEEE Transactions on Fuzzy Systems, 30(6), 2101–2112. https://doi.org/10.1109/TFUZZ.2021.3075501
  • Yang, Y., Vamvoudakis, K. G., Modares, H., Yin, Y., & Wunsch, D. C. (2020). Safe intermittent reinforcement learning with static and dynamic event generators. IEEE Transactions on Neural Networks and Learning Systems, 31(12), 5441–5455. https://doi.org/10.1109/TNNLS.5962385
  • Zhao, H., Zeng, X., Chen, T., & Liu, Z. (2020). Synthesizing barrier certificates using neural networks. In Proceedings of the 23rd international conference on hybrid systems: Computation and control (pp. 1–11).
  • Zhao, Q., Chen, X., Zhao, Z., Zhang, Y., Tang, E., & Li, X. (2022). Verifying neural network controlled systems using neural networks. In 25th ACM international conference on hybrid systems: Computation and control (pp. 1–11).