Safe reinforcement learning for dynamical systems using barrier certificates

Qingye ZhaoState Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, People's Republic of ChinaCorrespondence[email protected]

https://orcid.org/0000-0002-7503-6759 View further author information

Yi ZhangState Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, People's Republic of China

https://orcid.org/0000-0003-0214-8917 View further author information

Xuandong LiState Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, People's Republic of ChinaView further author information

Pages 2822-2844 | Received 29 May 2022, Accepted 19 Nov 2022, Published online: 12 Dec 2022

Cite this article
https://doi.org/10.1080/09540091.2022.2151567
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., & Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI}16) (pp. 265–283).
Google Scholar
Akametalu, A. K., Fisac, J. F., Gillula, J. H., Kaynama, S., Zeilinger, M. N., & C. J. Tomlin (2014). Reachability-based safe learning with Gaussian processes. In 53rd IEEE conference on decision and control (pp. 1424–1431).
Google Scholar
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., & Topcu, U. (2018). Safe reinforcement learning via shielding. In Thirty-second AAAI conference on artificial intelligence.
Google Scholar
Anderson, G., Verma, A., Dillig, I., & Chaudhuri, S. (2020). Neurosymbolic reinforcement learning with formally verified exploration. Advances in Neural Information Processing Systems, 33, 6172–6183. https://dl.acm.org/doi/abs/10.5555/3495724.3496242
Google Scholar
Bastani, O. (2019). Safe reinforcement learning via online shielding. Preprint arXiv:1905.10691, 288, 289.
Google Scholar
Berkenkamp, F., Turchetta, M., Schoellig, A. P., & Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. Preprint arXiv:1705.08551.
Google Scholar
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. Preprint arXiv:1606.01540.
Google Scholar
Cheng, R., Orosz, G., Murray, R. M., & Burdick, J. W. (2019). End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 3387–3395).
Google Scholar
Clavière, A., Asselin, E., Garion, C., & Pagetti, C. (2020). Safety verification of neural network controlled systems. Preprint arXiv:2011.05174.
Google Scholar
Cohen, M. H., & Belta, C. (2020). Approximate optimal control for safety-critical systems with control barrier functions. In 2020 59th IEEE conference on decision and control (CDC) (pp. 2062–2067).
Google Scholar
De Moura, L., & Bjørner, N. (2008). Z3: An efficient SMT solver. In International conference on tools and algorithms for the construction and analysis of systems (pp. 337–340).
Google Scholar
Deshmukh, J. V., Kapinski, J. P., Yamaguchi, T., & Prokhorov, D. (2019). Learning deep neural network controllers for dynamical systems with safety guarantees. In 2019 IEEE/ACM international conference on computer-aided design (ICCAD) (pp. 1–7).
Google Scholar
Dong, K., Luo, Y., Yu, T., Finn, C., & Ma, T. (2020). On the expressivity of neural networks for deep reinforcement learning. In International conference on machine learning (pp. 2627–2637).
Google Scholar
Dutta, S., Jha, S., Sankaranarayanan, S., & Tiwari, A. (2018a). Learning and verification of feedback control systems using feedforward neural networks. IFAC-PapersOnLine, 51(16), 151–156. https://doi.org/10.1016/j.ifacol.2018.08.026
Google Scholar
Dutta, S., Jha, S., Sankaranarayanan, S., & Tiwari, A. (2018b). Output range analysis for deep feedforward neural networks. In Nasa formal methods symposium (pp. 121–138).
Google Scholar
Fulton, N., & Platzer, A. (2018). Safe reinforcement learning via formal methods: Toward safe control through proof and learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32).
Google Scholar
Gao, S., Kong, S., & Clarke, E. M. (2013). dReal: An SMT solver for nonlinear theories over the reals. In International conference on automated deduction (pp. 208–214).
Google Scholar
Garcıa, J., & Fernández, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1), 1437–1480. https://dl.acm.org/doi/10.5555/2789272.2886795
Google Scholar
Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Yang, Y., & Knoll, A. (2022). A review of safe reinforcement learning: Methods, theory and applications. Preprint arXiv:2205.10330.
Google Scholar
Gurobi Optimization, I. (2022). Gurobi optimizer reference manual. Retrieved from https://www.gurobi.com
Google Scholar
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic algorithms and applications. Preprint arXiv:1812.05905.
Google Scholar
Hu, H., Fazlyab, M., Morari, M., & Pappas, G. J. (2020). Reach-sdp: Reachability analysis of closed-loop systems with neural network controllers via semidefinite programming. In 2020 59th IEEE conference on decision and control (CDC) (pp. 5929–5934).
Google Scholar
Huang, C., Fan, J., Li, W., Chen, X., & Zhu, Q. (2019). Reachnn: Reachability analysis of neural-network controlled systems. ACM Transactions on Embedded Computing Systems (TECS), 18(5s), 1–22. https://doi.org/10.1145/3358228
PubMedGoogle Scholar
Ivanov, R., Carpenter, T. J., Weimer, J., Alur, R., Pappas, G. J., & Lee, I. (2020). Verifying the safety of autonomous systems with neural network controllers. ACM Transactions on Embedded Computing Systems (TECS), 20(1), 1–26. https://doi.org/10.1145/3419742
Web of Science ®Google Scholar
Jin, W., Wang, Z., Yang, Z., & Mou, S. (2020). Neural certificates for safe control policies. Preprint arXiv:2006.08465.
Google Scholar
Julian, K. D., & Kochenderfer, M. J. (2019). A reachability method for verifying dynamical systems with deep neural network controllers. Preprint arXiv:1903.00520.
Google Scholar
Katz, G., Barrett, C., Dill, D. L., Julian, K., & Kochenderfer, M. J. (2017). Reluplex: An efficient SMT solver for verifying deep neural networks. In International conference on computer aided verification (pp. 97–117).
Google Scholar
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In ICLR (poster).
Google Scholar
Lin, L., Gong, S., Li, T., & Peeta, S. (2018). Deep learning-based human-driven vehicle trajectory prediction and its application for platoon control of connected and autonomous vehicles. In The autonomous vehicles symposium (Vol. 2018).
Google Scholar
Luo, Y., & Ma, T. (2021). Learning barrier certificates: Towards safe reinforcement learning with zero training-time violations. In Thirty-fifth conference on neural information processing systems.
Google Scholar
Ma, Y. J., Shen, A., Bastani, O., & Dinesh, J. (2022). Conservative and adaptive penalty for model-based safe reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, pp. 5404–5412).
Google Scholar
Ohnishi, M., Wang, L., Notomista, G., & Egerstedt, M. (2019). Barrier-certified adaptive reinforcement learning with applications to brushbot navigation. IEEE Transactions on Robotics, 35(5), 1186–1205. https://doi.org/10.1109/TRO.8860
Web of Science ®Google Scholar
Peruffo, A., Ahmed, D., & Abate, A. (2021). Automated and formal synthesis of neural barrier certificates for dynamical models. In International conference on tools and algorithms for the construction and analysis of systems (pp. 370–388).
Google Scholar
Prajna, S., & Jadbabaie, A. (2004). Safety verification of hybrid systems using barrier certificates. In Proceedings of the 7th international workshop on hybrid systems: Computation and control (HSCC) (pp. 477–492). Springer.
Google Scholar
Radac, M. B., & Precup, R. E. (2019). Data-driven model-free tracking reinforcement learning control with VRFT-based adaptive actor-critic. Applied Sciences, 9(9), 1807. https://doi.org/10.3390/app9091807
Google Scholar
Ravanbakhsh, H., & Sankaranarayanan, S. (2019). Learning control Lyapunov functions from counterexamples and demonstrations. Autonomous Robots, 43(2), 275–307. https://doi.org/10.1007/s10514-018-9791-9
Web of Science ®Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In International conference on machine learning (pp. 387–395).
Google Scholar
Singh, G., Gehr, T., Püschel, M., & Vechev, M. (2019). An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL), 1–30. https://doi.org/10.1145/3290354
Google Scholar
Srinivasan, K., Eysenbach, B., Ha, S., Tan, J., & Finn, C. (2020). Learning to be safe: Deep rl with a safety critic. Preprint arXiv:2010.14603.
Google Scholar
Stooke, A., Achiam, J., & Abbeel, P. (2020). Responsive safety in reinforcement learning by PID Lagrangian methods. In International conference on machine learning (pp. 9133–9143).
Google Scholar
Sun, X., Khedr, H., & Shoukry, Y. (2019). Formal verification of neural network controlled autonomous systems. In Proceedings of the 22nd ACM international conference on hybrid systems: Computation and control (pp. 147–156).
Google Scholar
Sutton, R. S., Szepesvári, C., & Maei, H. R. (2008). A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation. In Nips.
Google Scholar
Tamar, A., Chow, Y., Ghavamzadeh, M., & Mannor, S. (2016). Sequential decision making with coherent risk. IEEE Transactions on Automatic Control, 62(7), 3323–3338. https://doi.org/10.1109/TAC.2016.2644871
Web of Science ®Google Scholar
Taylor, A., Singletary, A., Yue, Y., & Ames, A. (2020). Learning for safety-critical control with control barrier functions. In Learning for dynamics and control (pp. 708–717).
Google Scholar
Thananjeyan, B., Balakrishna, A., Nair, S., Luo, M., Srinivasan, K., Hwang, M., Gonzalez, J. E., Ibarz, J., Finn, C., & Goldberg, K. (2021). Recovery rl: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6(3), 4915–4922. https://doi.org/10.1109/LRA.2021.3070252
Web of Science ®Google Scholar
Thomas, P. S. (Doctoral Dissertations, 2015). Safe reinforcement learning. https://doi.org/10.7275/7529913.0
Google Scholar
Wang, S., Pei, K., Whitehouse, J., Yang, J., & Jana, S. (2018). Formal security analysis of neural networks using symbolic intervals. In 27th {USENIX} security symposium ({USENIX} security 18) (pp. 1599–1614).
Google Scholar
Wang, T., Qin, R., Chen, Y., Snoussi, H., & Choi, C. (2019). A reinforcement learning approach for UAV target searching and tracking. Multimedia Tools and Applications, 78(4), 4347–4364. https://doi.org/10.1007/s11042-018-5739-5
Web of Science ®Google Scholar
Wei, T., Wang, Y., & Zhu, Q. (2017). Deep reinforcement learning for building HVAC control. In Proceedings of the 54th annual design automation conference 2017 (pp. 1–6).
Google Scholar
Winterer, F. (2017). iSAT3. Retrieved from https://projects.informatik.uni-freiburg.de/projects/isat3
Google Scholar
Wong, E., & Kolter, Z. (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. In International conference on machine learning (pp. 5286–5295).
Google Scholar
Xiang, W., Tran, H. D., Yang, X., & Johnson, T. T. (2020). Reachable set estimation for neural network control systems: A simulation-guided approach. IEEE Transactions on Neural Networks and Learning Systems, 32(5), 1821–1830. https://doi.org/10.1109/TNNLS.2020.2991090
Web of Science ®Google Scholar
Xu, J., Hou, Z., Wang, W., Xu, B., Zhang, K., & Chen, K. (2018). Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks. IEEE Transactions on Industrial Informatics, 15(3), 1658–1667. https://doi.org/10.1109/TII.2018.2868859
Web of Science ®Google Scholar
Yang, T. Y., Rosca, J., Narasimhan, K., & Ramadge, P. J. (2020). Accelerating safe reinforcement learning with constraint-mismatched policies. Preprint arXiv:2006.11645.
Google Scholar
Yang, Y., Ding, D. W., Xiong, H., Yin, Y., & Wunsch, D. C. (2020). Online barrier-actor-critic learning for H∞ control with full-state constraints and input saturation. Journal of the Franklin Institute, 357(6), 3316–3344. https://doi.org/10.1016/j.jfranklin.2019.12.017
Web of Science ®Google Scholar
Yang, Y., Gao, W., Modares, H., & Xu, C. Z. (2021). Robust actor-critic learning for continuous-time nonlinear systems with unmodeled dynamics. IEEE Transactions on Fuzzy Systems, 30(6), 2101–2112. https://doi.org/10.1109/TFUZZ.2021.3075501
Web of Science ®Google Scholar
Yang, Y., Vamvoudakis, K. G., Modares, H., Yin, Y., & Wunsch, D. C. (2020). Safe intermittent reinforcement learning with static and dynamic event generators. IEEE Transactions on Neural Networks and Learning Systems, 31(12), 5441–5455. https://doi.org/10.1109/TNNLS.5962385
PubMed Web of Science ®Google Scholar
Zhao, H., Zeng, X., Chen, T., & Liu, Z. (2020). Synthesizing barrier certificates using neural networks. In Proceedings of the 23rd international conference on hybrid systems: Computation and control (pp. 1–11).
Google Scholar
Zhao, Q., Chen, X., Zhao, Z., Zhang, Y., Tang, E., & Li, X. (2022). Verifying neural network controlled systems using neural networks. In 25th ACM international conference on hybrid systems: Computation and control (pp. 1–11).
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Safe reinforcement learning for dynamical systems using barrier certificates

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Safe reinforcement learning for dynamical systems using barrier certificates

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date