Publication Cover
Statistics
A Journal of Theoretical and Applied Statistics
Volume 57, 2023 - Issue 3
47
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Convergence in quadratic mean of averaged stochastic gradient algorithms without strong convexity nor bounded gradient

Pages 637-668 | Received 31 May 2022, Accepted 08 May 2023, Published online: 17 May 2023

References

  • Bach F. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. J Mach Learn Res. 2014;15(1):595–627.
  • Cohen K, Nedić A, Srikant R. On projected stochastic gradient descent algorithm with weighted averaging for least squares regression. IEEE Trans Automat Contr. 2017;62(11):5974–5981.
  • Cardot H, Cénac P, Godichon-Baggioni A. Online estimation of the geometric median in hilbert spaces: nonasymptotic confidence balls. Ann Stat. 2017;45(2):591–614.
  • Cardot H, Cénac P, Zitt P-A. Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli. 2013;19(1):18–43.
  • Godichon-Baggioni A. Estimating the geometric median in Hilbert spaces with stochastic gradient algorithms: Lp and almost sure rates of convergence. J Multivar Anal. 2016;146:209–222.
  • Bercu B, Costa M, Gadat S. Stochastic approximation algorithms for superquantiles estimation; 2020. arXiv preprint arXiv:2007.14659.
  • Costa M, Gadat S. Non asymptotic controls on a recursive superquantile approximation. Electron J Statist. 2021;15(2):4718–4769.
  • Alfarra M, Hanzely S, Albasyoni A, et al. Adaptive learning of the optimal mini-batch size of sgd; 2020. arXiv preprint arXiv:2005.01097.
  • Konečnỳ J, Liu J, Richtárik P, et al. Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J Sel Top Signal Process. 2015;10(2):242–255.
  • Robbins H, Monro S. A stochastic approximation method. Ann Math Stat. 1951;22(3):400–407.
  • Pelletier M. On the almost sure asymptotic behaviour of stochastic algorithms. Stoch Process Their Appl. 1998;78(2):217–244.
  • Ruppert D. Efficient estimations from a slowly convergent robbins-monro process. Technical report. Cornell University Operations Research and Industrial Engineering; 1988.
  • Polyak B, Juditsky A. Acceleration of stochastic approximation. SIAM J Control Optim. 1992;30(4):838–855.
  • Pelletier M. Asymptotic almost sure efficiency of averaged stochastic algorithms. SIAM J Control Optim. 2000;39(1):49–72.
  • Moulines E, Bach F. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Adv Neural Inf Process Syst. 2011;24.
  • Gadat S, Panloup F. Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity; 2017. arXiv preprint arXiv:1709.03342.
  • Godichon-Baggioni A. Online estimation of the asymptotic variance for averaged stochastic gradient algorithms. J Stat Plan Inference. 2019;203:1–19.
  • Godichon-Baggioni A. Lp and almost sure rates of convergence of averaged stochastic gradient algorithms: locally strongly convex objective. ESAIM – Probab Stat. 2019;23:841–873.
  • Défossez A, Bottou L, Bach F, et al. A simple convergence proof of adam and adagrad; 2020. arXiv preprint arXiv:2003.02395.
  • Wang H, Gurbuzbalaban M, Zhu L, et al. Convergence rates of stochastic gradient descent under infinite noise variance. Adv Neural Inf Process Syst. 2021;34:18866–18877.
  • Li CJ, Mou W, Wainwright M, et al. Root-sgd: sharp nonasymptotics and asymptotic efficiency in a single algorithm. In: Conference on learning theory, PMLR. 2022; p. 909–981.
  • Chaudhuri P. Multivariate location estimation using extension of R-estimates through U-statistics type approach. Ann Statist. 1992;20(2):897–916.
  • Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12(7):2121–2159.
  • Boyer C, Godichon-Baggioni A. On the asymptotic rate of convergence of stochastic Newton algorithms and their weighted averaged versions; 2020. arXiv preprint arXiv:2011.09706.
  • Mokkadem A, Pelletier M. A generalization of the averaging procedure: the use of two-time-scale algorithms. SIAM J Control Optim. 2011;49(4):1523–1543.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.