Search in:

Advanced search

Journal of the Chinese Institute of Engineers Latest Articles

Submit an article Journal homepage

Views

CrossRef citations to date

Altmetric

Research Article

Comprehensive survey on the effectiveness of sharpness aware minimization and its progressive variants

Jules RostandDepartment of Electrical Engineering, National Taiwan Normal University, Taipei, TaiwanView further author information

Chen-Chien James HsuDepartment of Electrical Engineering, National Taiwan Normal University, Taipei, TaiwanView further author information

Cheng-Kai LuDepartment of Electrical Engineering, National Taiwan Normal University, Taipei, TaiwanCorrespondence[email protected]
View further author information

Received 16 Apr 2024, Accepted 10 Jun 2024, Published online: 04 Aug 2024

Cite this article
https://doi.org/10.1080/02533839.2024.2383592
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Anand, D., R. Patil, U. Agrawal, V. R. H. Ravishankar, and P. Sudhakar. 2022. “Towards Generalization of Medical Imaging AI Models: Sharpness-Aware Minimizers and Beyond.” 2022 IEEE 19th International Symposium on Biomedical Imaging, (ISBI), Kolkata, India, 28–31 March 2022: 1–5. Washington, DC: IEEE. doi:10.1109/ISBI52829.2022.9761677.
Google Scholar
Andriushchenko, M., and N. Flammarion. 2022. “Towards Understanding Sharpness-Aware Minimization.” Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, 17–23 July 2022: vol: 162: 639–668. Westminster, UK: PLMR.
Google Scholar
Bahri, D., H. Mobahi, and Y. Tay. 2022. “Sharpness-Aware Minimization Improves Language Model Generalization.” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022: vol: 1: 7360–7371. Stroudsburg, Pennsylvania, US: Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.508.
Google Scholar
Behdin, K., and R. Mazumder. 2023. “On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees.” arXiv Preprint: 2302: 11836. doi:10.48550/arXiv.2302.11836.
Google Scholar
Behdin, K., Q. Song, A. Gupta, S. Keerthi, A. Acharya, B. Ocejo, G. Dexter, R. Khanna, D. Durfee, and R. Mazumder. 2023. “mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization.” arXiv Preprint: 2302.09693. doi:10.48550/arXiv.2302.09693.
Google Scholar
Brock, A., S. de, S. L. Smith, and K. Simonyan. 2021. “High-Performance Large-Scale Image Recognition without Normalization.” In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021: vol: 139: 1059–1071. Westminster, UK: PLMR.
Google Scholar
Chaudhari, P., A. Choromańska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina. 2019. “Entropy-SGD: Biasing Gradient Descent into Wide Valleys.” Journal of Statistical Mechanics: Theory and Experiment 2019 (12): 124018. doi:10.1088/1742-5468/ab39d9.
Web of Science ®Google Scholar
Chen, X., C. J. Hsieh, and B. Gong. 2022. “When Vision Transformers Outperform ResNets without Pre-Training or Strong Data Augmentations.” In Proceedings of the 10th International Conference on Learning Representations, Virtual, 25–29 April 2022: 1–20. Appleton, Wisconsin: ICLR.
Google Scholar
Dinh, L., R. Pascanu, S. Bengio, and Y. Bengio. 2017. “Sharp Minima Can Generalize for Deep Nets.” In Proceedings of the 34th International Conference on Machine Learning, Sydney NSW, Australia: 6–11 August 2017: vol: 70: 1019–1028. Westminster, UK. PMLR.
Google Scholar
Du, J., H. Yan, J. Feng, J. T. Zhou, L. Zhen, R. S. M. Goh, and V. Tan. 2022. “Efficient Sharpness-Aware Minimization for Improved Training of Neural Networks.” In Proceedings of the 10th International Conference on Learning Representations, Virtual, 25–29 April 2022: 1–18. Appleton, Wisconsin: ICLR.
Google Scholar
Du, J., D. Zhou, J. Feng, V. Y. F. Tan, and J. T. Zhou. 2022. “Sharpness-Aware Training for Free.” In Proceedings of the 36th Conference on Neural Information Processing Systems, New Orleans, Louisiana, USA, 28 November – 9 December 2022: 23439–23451. NY, US: Curran Associates Inc.
Google Scholar
Dziugaite, G. K., and D. M. Roy. 2017. “Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters Than Training Data.” Conference on Uncertainty in Artificial Intelligence, Sydney, Australia, 11–15 August 2017. Corvallis, Oregon: AUAI Press.
Google Scholar
Foret, P., A. Kleiner, H. Mobahi, and B. Neyshabur. 2021. “Sharpness-Aware Minimization for Efficiently Improving Generalization.” In Proceedings of the 9th International Conference on Learning Representations, Virtual, 3–7 May 2021: 1–19. Appleton, Wisconsin: ICLR.
Google Scholar
Garipov, T., P. Izmailov, D. Podoprikhin, D. Vetrov, and A. G. Wilson. 2018. “Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, edited by S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, and N. Cesa-Bianchi, Montréal, Canada, 2–8 December 2018: 8803–8812. Curran Associates Inc.
Google Scholar
Gulati, A., J. Qin, C. C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, et al. 2020. “Conformer: Convolution-Augmented Transformer for Speech Recognition.” Interspeech 2020, Grenoble, France: 25–29 October 2020: 5036–5040. Shanghai, China: ISCA. doi:10.21437/Interspeech.2020-3015.
Google Scholar
Hochreiter, S., and J. Schmidhuber. 1994. “Simplifying Neural Nets by Discovering Flat Minima.” Advances in Neural Information Processing Systems 7 (NIPS 1994), edited by T. K. Leen, G. Tesauro, and D. S. Touretzky, Cambridge, Massachusetts: 28 November 1994: 529–536: 7: Denver, Colorado, USA: The MIT Press.
Google Scholar
Ioffe, S., and C. Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” Proceedings of the 32nd International Conference on Machine Learning, edited by F. Bach and D. Blei, NY, US, 6–11 July 2015: 37, 448–456. Lille, France: JMLR.
Google Scholar
Izmailov, P., D. Podoprikhin, T. Garipov, D. P. Vetrov, and A. G. Wilson. 2018. “Averaging Weights Leads to Wider Optima and Better Generalization.” Conference on Uncertainty in Artificial Intelligence, Corvallis, Oregon: 6–10 August 2018: 876–885. Monterey, California, USA: AUAI Press.
Google Scholar
Jiang, L., Z. Zhou, T. Leung, L. J. Li, and L. Fei-Fei. 2018. “MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels.” Proceedings of the 35th International Conference on Machine Learning, edited by J. Dy and A. Krause, Westminster, UK: 10–15 July 2018: 80. 2304–2313: Stockholm, Sweden: PMLR.
Google Scholar
Jiang, W., H. Yang, Y. Zhang, and J. Kwok. 2023. “An Adaptive Policy to Employ Sharpness-Aware Minimization.” Paper presented in The Eleventh International Conference on Learning Representations, Appleton, Wisconsin. 1–5 May 2023: Kigali, Rwanda: ICLR.
Google Scholar
Jiang, Y., B. Neyshabur, H. Mobahi, D. Krishnan, and S. Bengio. 2019. “Fantastic Generalization Measures and Where to Find Them.” International Conference on Learning Representations 2020, Addis Ababa, Appleton, Wisconsin, 30 April 2020: 1–26. Ethiopia: ICLR.
Google Scholar
Keskar, N. S., D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. 2017. “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.” The fifth International Conference on Learning Representations, Appleton, Wisconsin, 24–26 April 2017: 1–16. Toulon, France,: ICLR.
Google Scholar
Kim, M., D. Li, S. X. Hu, and T. M. Hospedales. 2022. “Fisher SAM: Information Geometry and Sharpness Aware Minimisation.” Proceedings of the 39th International Conference on Machine Learning, Westminster, UK, 17–23 July 2022: 162: 11148–11161. Baltimore, Maryland, USA: PMLR.
Google Scholar
Kingma, D. P., and J. Ba. 2015. “Adam: A Method for Stochastic Optimization.” The Third International Conference on Learning Representations, Appleton, Wisconsin, 7–9 May 2015: San Diego, California, USA: ICLR.
Google Scholar
Kwon, J., J. Kim, H. Park, and I. K. Choi. 2021. “ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks.” Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021: 139: 5905–5914. Westminster, UK: PMLR.
Google Scholar
Li, B., and G. B. Giannakis. 2023. “Enhancing Sharpness-Aware Optimization Through Variance Suppression.” 37th Conference on Neural Information Processing Systems, San Diego, CA, 10–16 December 2023. New Orleans, Louisiana, USA: NeurIPS.
Google Scholar
Liao, D., T. Jiang, F. Wang, L. Li, and Q. Hong. 2023. “Towards a Unified Conformer Structure: From ASR to ASV Task.” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Washington, DC, 4–10 June 2023: 1–5. Rhodes Island, Greece: IEEE. doi:10.1109/ICASSP49357.2023.10095433.
Google Scholar
Liu, Y., S. Mai, X. Chen, C. J. Hsieh, and Y. You. 2022. “Towards Efficient and Scalable Sharpness-Aware Minimization.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, 12350–12360. 18–24 June 2022: New Orleans, LA: IEEE. doi:10.1109/CVPR52688.2022.01204.
Google Scholar
Liu, Y., S. Mai, M. Cheng, X. Chen, C. J. Hsieh, and Y. You. 2022. “Random Sharpness-Aware Minimization.” The 36th Conference on Neural Information Processing Systems, San Diego, CA, 28 November 2022: New Orleans, Louisiana, USA: NeurIPS.
Google Scholar
Loshchilov, I., and F. Hutter. 2019. “Decoupled Weight Decay Regularization.” The Seventh International Conference on Learning Representations (ICLR 2019), Appleton, Wisconsin, 6–9 May 2019: New Orleans, Louisiana, USA: ICLR.
Google Scholar
Madry, A., A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. 2019. “Towards Deep Learning Models Resistant to Adversarial Attacks.” The Seventh International Conference on Learning Representations, Appleton, Wisconsin. 6–9 May 2019: New Orleans, LA, USA: ICLR.
Google Scholar
Mi, P., L. Shen, T. Ren, Y. Zhou, X. Sun, R. Ji, and D. Tao. 2022. “Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.” 36th Conference on Neural Information Processing Systems (NeurIPS 2022), San Diego, CA. 28 November 2022: New Orleans, USA: NeurIPS.
Google Scholar
Mobahi, H. 2016. “Training Recurrent Neural Networks by Diffusion.” ArXiv Preprint: 04114v2. doi:10.48550/arXiv.1601.04114.
Google Scholar
Mueller, M., and M. Hein. 2022. “Perturbing BatchNorm and Only BatchNorm Benefits Sharpness-Aware Minimization.” Has it Trained Yet? NeurIPS 2022 Workshop, San Diego, CA, 2 December 2022: New Orleans: NeurIPS.
Google Scholar
Nakkiran, P., G. Kaplun, Y. Bansal, T. Yang, B. Barak, and I. Sutskever. 2021. “Deep Double Descent: Where Bigger Models and More Data Hurt.” Journal of Statistical Mechanics: Theory & Experiment 2021 (12): 124003. doi:10.1088/1742-5468/ac3a74.
Web of Science ®Google Scholar
Nesterov, Y. 1983. “A Method for Solving the Convex Programming Problem with Convergence Rate O(1/K^2).” Proceedings of the USSR Academy of Sciences 269:543–547.
Web of Science ®Google Scholar
Neyshabur, B., S. Bhojanapalli, D. McAllester, and N. Srebro. 2017. “Exploring Generalization in Deep Learning.” NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, New York: 4–9 December 2017: 5949–5958. Long Beach, California, USA: ACM Digital Library.
Google Scholar
Neyshabur, B., R. Tomioka, and N. Srebro. 2015. “In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning.” The Third International Conference on Learning Representations, Appleton, Wisconsin. 7–9 May 2015: San Diego, California, USA. ICLR.
Google Scholar
Ni, R., P. Y. Chiang, J. Geiping, M. Goldblum, A. G. Wilson, and T. Goldstein. 2022. “K-SAM: Sharpness-Aware Minimization at the Speed of SGD.” The Eleventh International Conference on Learning Representations (ICLR 2023), Appleton, Wisconsin, 1–5 May 2023: Kigali, Rwanda: ICLR.
Google Scholar
Norton, M. D., and J. O. Royset. 2021. “Diametrical Risk Minimization: Theory and Computations.” Machine Learning 112 (8): 2933–2951. doi:10.1007/s10994-021-06036-0.
Web of Science ®Google Scholar
Rice, L., E. Wong, and J. Z. Kolter. 2020. “Overfitting in Adversarially Robust Deep Learning.” ICML’20: Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020. 8093–8104. Westminster, UK: PMLR.
Google Scholar
Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15:1929–1958. doi:10.5555/2627435.2670313.
Web of Science ®Google Scholar
Tan, M., and Q. Le. 2019. “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” Proceedings of the 36th International Conference on Machine Learning, Westminster, UK, 9–15 June 2019: 6105–6114. Long Beach, California, USA: PMLR.
Google Scholar
Tang, C., B. Li, J. Sun, S. H. Wang, and Y. D. Zhang. 2023. “GAM-Spcanet: Gradient Awareness Minimization-Based Spinal Convolution Attention Network for Brain Tumor Classification.” Journal of King Saud University - Computer and Information Sciences 35 (2): 560–575. doi:10.1016/j.jksuci.2023.01.002.
PubMed Web of Science ®Google Scholar
Tsipras, D., S. Santurkar, L. Engstrom, A. Turner, and A. Madry. 2019. “Robustness May be at Odds with Accuracy.” The Seventh International Conference on Learning Representations, Appleton, Wisconsin, 6–9 May 2019: New Orleans, LA, USA: ICLR.
Google Scholar
Wang, P., Z. Zhang, Z. Lei, and L. Zhang. 2023. “Sharpness-Aware Gradient Matching for Domain Generalization.” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, 17–24 June 2023. 3769–3778, Vancouver, BC, Canada: IEEE. doi:10.1109/CVPR52729.2023.00367.
Google Scholar
Wei, Z., J. Zhu, and Y. Zhang. 2023. “Sharpness-Aware Minimization Alone Can Improve Adversarial Robustness.” The Second Workshop on New Frontiers in Adversarial Machine Learning (AdvML-Frontiers 2023), Westminster, UK, 28 July 2023. Honolulu, Hawaii, USA: PMLR.
Google Scholar
Wu, D., S. Xia, and Y. Wang. 2020. “Adversarial Weight Perturbation Helps Robust Generalization.” NIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, San Diego, CA, 6–12 December 2020. 2958–2969. Vancouver, BC, Canada,: NeurIPS.
Google Scholar
Wu, T., T. Luo, and D. C. Wunsch. 2024. “CR-SAM: Curvature Regularized Sharpness-Aware Minimization.” Proceedings of the 38th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 20–27 February 2024: 38: 6144–6152. Vancouver, Canada: AAAI Press.
Google Scholar
Xu, Z., A. M. Dai, J. Kemp, and L. Metz. 2019. “Learning an Adaptive Learning Rate Schedule.” ArXiv Preprint: 09712. doi:10.48550/arXiv.1909.09712.
Google Scholar
Yue, Y., J. Jiang, Z. Ye, N. Gao, Y. Liu, and K. Zhang. 2023. “Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term.” Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 6–10 August 2023: 3185–3194. Long Beach, CA, USA. Association for Computing Machinery. doi:10.1145/3580305.3599501.
Google Scholar
Yun, J., and E. Yang. 2023. “Riemannian SAM: Sharpness-Aware Minimization on Riemannian Manifolds.” NIPS ’23: Proceedings of the 37th Conference on Neural Information Processing Systems, San Diego, CA: 65784–65800. New Orleans, LA, USA,: NeurIPS. 10–16 December 2023. doi:10.5555/3666122.3668993.
Google Scholar
Zhang, C., S. Bengio, M. Hardt, B. Recht, and O. Vinyals. 2021. “Understanding Deep Learning (Still) Requires Rethinking Generalization.” Communications of the ACM 64 (3): 107–115. doi:10.1145/3446776.
Web of Science ®Google Scholar
Zhang, H., M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. 2018. “Mixup: Beyond Empirical Risk Minimization.” 6th International Conference on Learning Representations (ICLR 2018), Appleton, Wisconsin, 30 April 2018. Vancouver, BC, Canada. ICLR.
Google Scholar
Zhang, Z., R. Luo, Q. Su, and X. Sun. 2022. “GA-SAM: Gradient-Strength Based Adaptive Sharpness-Aware Minimization for Improved Generalization.” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, edited by Y. Goldberg, Z. Kozareva, and Y. Zhang, Pennsylvania, United States, 7–11 December 2022: 3888–3903. Abu Dhabi, United Arab Emirates: ACL.
Google Scholar
Zheng, Y., R. Zhang, and Y. Mao. 2021. “Regularizing Neural Networks via Adversarial Model Perturbation.” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, 20–25 June 2021: 8152–8161. Nashville, TN, USA: IEEE. doi:10.1109/CVPR46437.2021.00806.
Google Scholar
Zhong, Q., L. Ding, L. Shen, P. Mi, J. Liu, B. du, and D. Tao. 2022. “Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models.” Findings of the Association for Computational Linguistics: EMNLP 2022, edited by Y. Goldberg, Z. Kozareva, and Y. Zhang, Pennsylvania, United States, 7–11 December 2022: 4064–4085. Abu Dhabi, United Arab Emirates: ACL.
Google Scholar
Zhuang, J., B. Gong, L. Yuan, Y. Cui, H. Adam, N. Dvornek, S. Tatikonda, J. Duncan, and T. Liu. 2022. “Surrogate Gap Minimization Improves Sharpness-Aware Training.” The Tenth International Conference on Learning Representations (ICLR 2022), Virtual, 25 April 2022. Appleton, Wisconsin: ICLR.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Comprehensive survey on the effectiveness of sharpness aware minimization and its progressive variants

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Comprehensive survey on the effectiveness of sharpness aware minimization and its progressive variants

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date