0
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Comprehensive survey on the effectiveness of sharpness aware minimization and its progressive variants

, &
Received 16 Apr 2024, Accepted 10 Jun 2024, Published online: 04 Aug 2024

References

  • Anand, D., R. Patil, U. Agrawal, V. R. H. Ravishankar, and P. Sudhakar. 2022. “Towards Generalization of Medical Imaging AI Models: Sharpness-Aware Minimizers and Beyond.” 2022 IEEE 19th International Symposium on Biomedical Imaging, (ISBI), Kolkata, India, 28–31 March 2022: 1–5. Washington, DC: IEEE. doi:10.1109/ISBI52829.2022.9761677.
  • Andriushchenko, M., and N. Flammarion. 2022. “Towards Understanding Sharpness-Aware Minimization.” Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, 17–23 July 2022: vol: 162: 639–668. Westminster, UK: PLMR.
  • Bahri, D., H. Mobahi, and Y. Tay. 2022. “Sharpness-Aware Minimization Improves Language Model Generalization.” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022: vol: 1: 7360–7371. Stroudsburg, Pennsylvania, US: Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.508.
  • Behdin, K., and R. Mazumder. 2023. “On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees.” arXiv Preprint: 2302: 11836. doi:10.48550/arXiv.2302.11836.
  • Behdin, K., Q. Song, A. Gupta, S. Keerthi, A. Acharya, B. Ocejo, G. Dexter, R. Khanna, D. Durfee, and R. Mazumder. 2023. “mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization.” arXiv Preprint: 2302.09693. doi:10.48550/arXiv.2302.09693.
  • Brock, A., S. de, S. L. Smith, and K. Simonyan. 2021. “High-Performance Large-Scale Image Recognition without Normalization.” In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021: vol: 139: 1059–1071. Westminster, UK: PLMR.
  • Chaudhari, P., A. Choromańska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina. 2019. “Entropy-SGD: Biasing Gradient Descent into Wide Valleys.” Journal of Statistical Mechanics: Theory and Experiment 2019 (12): 124018. doi:10.1088/1742-5468/ab39d9.
  • Chen, X., C. J. Hsieh, and B. Gong. 2022. “When Vision Transformers Outperform ResNets without Pre-Training or Strong Data Augmentations.” In Proceedings of the 10th International Conference on Learning Representations, Virtual, 25–29 April 2022: 1–20. Appleton, Wisconsin: ICLR.
  • Dinh, L., R. Pascanu, S. Bengio, and Y. Bengio. 2017. “Sharp Minima Can Generalize for Deep Nets.” In Proceedings of the 34th International Conference on Machine Learning, Sydney NSW, Australia: 6–11 August 2017: vol: 70: 1019–1028. Westminster, UK. PMLR.
  • Du, J., H. Yan, J. Feng, J. T. Zhou, L. Zhen, R. S. M. Goh, and V. Tan. 2022. “Efficient Sharpness-Aware Minimization for Improved Training of Neural Networks.” In Proceedings of the 10th International Conference on Learning Representations, Virtual, 25–29 April 2022: 1–18. Appleton, Wisconsin: ICLR.
  • Du, J., D. Zhou, J. Feng, V. Y. F. Tan, and J. T. Zhou. 2022. “Sharpness-Aware Training for Free.” In Proceedings of the 36th Conference on Neural Information Processing Systems, New Orleans, Louisiana, USA, 28 November – 9 December 2022: 23439–23451. NY, US: Curran Associates Inc.
  • Dziugaite, G. K., and D. M. Roy. 2017. “Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters Than Training Data.” Conference on Uncertainty in Artificial Intelligence, Sydney, Australia, 11–15 August 2017. Corvallis, Oregon: AUAI Press.
  • Foret, P., A. Kleiner, H. Mobahi, and B. Neyshabur. 2021. “Sharpness-Aware Minimization for Efficiently Improving Generalization.” In Proceedings of the 9th International Conference on Learning Representations, Virtual, 3–7 May 2021: 1–19. Appleton, Wisconsin: ICLR.
  • Garipov, T., P. Izmailov, D. Podoprikhin, D. Vetrov, and A. G. Wilson. 2018. “Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, edited by S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, and N. Cesa-Bianchi, Montréal, Canada, 2–8 December 2018: 8803–8812. Curran Associates Inc.
  • Gulati, A., J. Qin, C. C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, et al. 2020. “Conformer: Convolution-Augmented Transformer for Speech Recognition.” Interspeech 2020, Grenoble, France: 25–29 October 2020: 5036–5040. Shanghai, China: ISCA. doi:10.21437/Interspeech.2020-3015.
  • Hochreiter, S., and J. Schmidhuber. 1994. “Simplifying Neural Nets by Discovering Flat Minima.” Advances in Neural Information Processing Systems 7 (NIPS 1994), edited by T. K. Leen, G. Tesauro, and D. S. Touretzky, Cambridge, Massachusetts: 28 November 1994: 529–536: 7: Denver, Colorado, USA: The MIT Press.
  • Ioffe, S., and C. Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” Proceedings of the 32nd International Conference on Machine Learning, edited by F. Bach and D. Blei, NY, US, 6–11 July 2015: 37, 448–456. Lille, France: JMLR.
  • Izmailov, P., D. Podoprikhin, T. Garipov, D. P. Vetrov, and A. G. Wilson. 2018. “Averaging Weights Leads to Wider Optima and Better Generalization.” Conference on Uncertainty in Artificial Intelligence, Corvallis, Oregon: 6–10 August 2018: 876–885. Monterey, California, USA: AUAI Press.
  • Jiang, L., Z. Zhou, T. Leung, L. J. Li, and L. Fei-Fei. 2018. “MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels.” Proceedings of the 35th International Conference on Machine Learning, edited by J. Dy and A. Krause, Westminster, UK: 10–15 July 2018: 80. 2304–2313: Stockholm, Sweden: PMLR.
  • Jiang, W., H. Yang, Y. Zhang, and J. Kwok. 2023. “An Adaptive Policy to Employ Sharpness-Aware Minimization.” Paper presented in The Eleventh International Conference on Learning Representations, Appleton, Wisconsin. 1–5 May 2023: Kigali, Rwanda: ICLR.
  • Jiang, Y., B. Neyshabur, H. Mobahi, D. Krishnan, and S. Bengio. 2019. “Fantastic Generalization Measures and Where to Find Them.” International Conference on Learning Representations 2020, Addis Ababa, Appleton, Wisconsin, 30 April 2020: 1–26. Ethiopia: ICLR.
  • Keskar, N. S., D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. 2017. “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.” The fifth International Conference on Learning Representations, Appleton, Wisconsin, 24–26 April 2017: 1–16. Toulon, France,: ICLR.
  • Kim, M., D. Li, S. X. Hu, and T. M. Hospedales. 2022. “Fisher SAM: Information Geometry and Sharpness Aware Minimisation.” Proceedings of the 39th International Conference on Machine Learning, Westminster, UK, 17–23 July 2022: 162: 11148–11161. Baltimore, Maryland, USA: PMLR.
  • Kingma, D. P., and J. Ba. 2015. “Adam: A Method for Stochastic Optimization.” The Third International Conference on Learning Representations, Appleton, Wisconsin, 7–9 May 2015: San Diego, California, USA: ICLR.
  • Kwon, J., J. Kim, H. Park, and I. K. Choi. 2021. “ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks.” Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021: 139: 5905–5914. Westminster, UK: PMLR.
  • Li, B., and G. B. Giannakis. 2023. “Enhancing Sharpness-Aware Optimization Through Variance Suppression.” 37th Conference on Neural Information Processing Systems, San Diego, CA, 10–16 December 2023. New Orleans, Louisiana, USA: NeurIPS.
  • Liao, D., T. Jiang, F. Wang, L. Li, and Q. Hong. 2023. “Towards a Unified Conformer Structure: From ASR to ASV Task.” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Washington, DC, 4–10 June 2023: 1–5. Rhodes Island, Greece: IEEE. doi:10.1109/ICASSP49357.2023.10095433.
  • Liu, Y., S. Mai, X. Chen, C. J. Hsieh, and Y. You. 2022. “Towards Efficient and Scalable Sharpness-Aware Minimization.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, 12350–12360. 18–24 June 2022: New Orleans, LA: IEEE. doi:10.1109/CVPR52688.2022.01204.
  • Liu, Y., S. Mai, M. Cheng, X. Chen, C. J. Hsieh, and Y. You. 2022. “Random Sharpness-Aware Minimization.” The 36th Conference on Neural Information Processing Systems, San Diego, CA, 28 November 2022: New Orleans, Louisiana, USA: NeurIPS.
  • Loshchilov, I., and F. Hutter. 2019. “Decoupled Weight Decay Regularization.” The Seventh International Conference on Learning Representations (ICLR 2019), Appleton, Wisconsin, 6–9 May 2019: New Orleans, Louisiana, USA: ICLR.
  • Madry, A., A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. 2019. “Towards Deep Learning Models Resistant to Adversarial Attacks.” The Seventh International Conference on Learning Representations, Appleton, Wisconsin. 6–9 May 2019: New Orleans, LA, USA: ICLR.
  • Mi, P., L. Shen, T. Ren, Y. Zhou, X. Sun, R. Ji, and D. Tao. 2022. “Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.” 36th Conference on Neural Information Processing Systems (NeurIPS 2022), San Diego, CA. 28 November 2022: New Orleans, USA: NeurIPS.
  • Mobahi, H. 2016. “Training Recurrent Neural Networks by Diffusion.” ArXiv Preprint: 04114v2. doi:10.48550/arXiv.1601.04114.
  • Mueller, M., and M. Hein. 2022. “Perturbing BatchNorm and Only BatchNorm Benefits Sharpness-Aware Minimization.” Has it Trained Yet? NeurIPS 2022 Workshop, San Diego, CA, 2 December 2022: New Orleans: NeurIPS.
  • Nakkiran, P., G. Kaplun, Y. Bansal, T. Yang, B. Barak, and I. Sutskever. 2021. “Deep Double Descent: Where Bigger Models and More Data Hurt.” Journal of Statistical Mechanics: Theory & Experiment 2021 (12): 124003. doi:10.1088/1742-5468/ac3a74.
  • Nesterov, Y. 1983. “A Method for Solving the Convex Programming Problem with Convergence Rate O(1/K^2).” Proceedings of the USSR Academy of Sciences 269:543–547.
  • Neyshabur, B., S. Bhojanapalli, D. McAllester, and N. Srebro. 2017. “Exploring Generalization in Deep Learning.” NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, New York: 4–9 December 2017: 5949–5958. Long Beach, California, USA: ACM Digital Library.
  • Neyshabur, B., R. Tomioka, and N. Srebro. 2015. “In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning.” The Third International Conference on Learning Representations, Appleton, Wisconsin. 7–9 May 2015: San Diego, California, USA. ICLR.
  • Ni, R., P. Y. Chiang, J. Geiping, M. Goldblum, A. G. Wilson, and T. Goldstein. 2022. “K-SAM: Sharpness-Aware Minimization at the Speed of SGD.” The Eleventh International Conference on Learning Representations (ICLR 2023), Appleton, Wisconsin, 1–5 May 2023: Kigali, Rwanda: ICLR.
  • Norton, M. D., and J. O. Royset. 2021. “Diametrical Risk Minimization: Theory and Computations.” Machine Learning 112 (8): 2933–2951. doi:10.1007/s10994-021-06036-0.
  • Rice, L., E. Wong, and J. Z. Kolter. 2020. “Overfitting in Adversarially Robust Deep Learning.” ICML’20: Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020. 8093–8104. Westminster, UK: PMLR.
  • Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15:1929–1958. doi:10.5555/2627435.2670313.
  • Tan, M., and Q. Le. 2019. “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” Proceedings of the 36th International Conference on Machine Learning, Westminster, UK, 9–15 June 2019: 6105–6114. Long Beach, California, USA: PMLR.
  • Tang, C., B. Li, J. Sun, S. H. Wang, and Y. D. Zhang. 2023. “GAM-Spcanet: Gradient Awareness Minimization-Based Spinal Convolution Attention Network for Brain Tumor Classification.” Journal of King Saud University - Computer and Information Sciences 35 (2): 560–575. doi:10.1016/j.jksuci.2023.01.002.
  • Tsipras, D., S. Santurkar, L. Engstrom, A. Turner, and A. Madry. 2019. “Robustness May be at Odds with Accuracy.” The Seventh International Conference on Learning Representations, Appleton, Wisconsin, 6–9 May 2019: New Orleans, LA, USA: ICLR.
  • Wang, P., Z. Zhang, Z. Lei, and L. Zhang. 2023. “Sharpness-Aware Gradient Matching for Domain Generalization.” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, 17–24 June 2023. 3769–3778, Vancouver, BC, Canada: IEEE. doi:10.1109/CVPR52729.2023.00367.
  • Wei, Z., J. Zhu, and Y. Zhang. 2023. “Sharpness-Aware Minimization Alone Can Improve Adversarial Robustness.” The Second Workshop on New Frontiers in Adversarial Machine Learning (AdvML-Frontiers 2023), Westminster, UK, 28 July 2023. Honolulu, Hawaii, USA: PMLR.
  • Wu, D., S. Xia, and Y. Wang. 2020. “Adversarial Weight Perturbation Helps Robust Generalization.” NIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, San Diego, CA, 6–12 December 2020. 2958–2969. Vancouver, BC, Canada,: NeurIPS.
  • Wu, T., T. Luo, and D. C. Wunsch. 2024. “CR-SAM: Curvature Regularized Sharpness-Aware Minimization.” Proceedings of the 38th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 20–27 February 2024: 38: 6144–6152. Vancouver, Canada: AAAI Press.
  • Xu, Z., A. M. Dai, J. Kemp, and L. Metz. 2019. “Learning an Adaptive Learning Rate Schedule.” ArXiv Preprint: 09712. doi:10.48550/arXiv.1909.09712.
  • Yue, Y., J. Jiang, Z. Ye, N. Gao, Y. Liu, and K. Zhang. 2023. “Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term.” Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 6–10 August 2023: 3185–3194. Long Beach, CA, USA. Association for Computing Machinery. doi:10.1145/3580305.3599501.
  • Yun, J., and E. Yang. 2023. “Riemannian SAM: Sharpness-Aware Minimization on Riemannian Manifolds.” NIPS ’23: Proceedings of the 37th Conference on Neural Information Processing Systems, San Diego, CA: 65784–65800. New Orleans, LA, USA,: NeurIPS. 10–16 December 2023. doi:10.5555/3666122.3668993.
  • Zhang, C., S. Bengio, M. Hardt, B. Recht, and O. Vinyals. 2021. “Understanding Deep Learning (Still) Requires Rethinking Generalization.” Communications of the ACM 64 (3): 107–115. doi:10.1145/3446776.
  • Zhang, H., M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. 2018. “Mixup: Beyond Empirical Risk Minimization.” 6th International Conference on Learning Representations (ICLR 2018), Appleton, Wisconsin, 30 April 2018. Vancouver, BC, Canada. ICLR.
  • Zhang, Z., R. Luo, Q. Su, and X. Sun. 2022. “GA-SAM: Gradient-Strength Based Adaptive Sharpness-Aware Minimization for Improved Generalization.” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, edited by Y. Goldberg, Z. Kozareva, and Y. Zhang, Pennsylvania, United States, 7–11 December 2022: 3888–3903. Abu Dhabi, United Arab Emirates: ACL.
  • Zheng, Y., R. Zhang, and Y. Mao. 2021. “Regularizing Neural Networks via Adversarial Model Perturbation.” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, 20–25 June 2021: 8152–8161. Nashville, TN, USA: IEEE. doi:10.1109/CVPR46437.2021.00806.
  • Zhong, Q., L. Ding, L. Shen, P. Mi, J. Liu, B. du, and D. Tao. 2022. “Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models.” Findings of the Association for Computational Linguistics: EMNLP 2022, edited by Y. Goldberg, Z. Kozareva, and Y. Zhang, Pennsylvania, United States, 7–11 December 2022: 4064–4085. Abu Dhabi, United Arab Emirates: ACL.
  • Zhuang, J., B. Gong, L. Yuan, Y. Cui, H. Adam, N. Dvornek, S. Tatikonda, J. Duncan, and T. Liu. 2022. “Surrogate Gap Minimization Improves Sharpness-Aware Training.” The Tenth International Conference on Learning Representations (ICLR 2022), Virtual, 25 April 2022. Appleton, Wisconsin: ICLR.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.