References
- Bach F. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. J Mach Learn Res. 2014;15(1):595–627.
- Cohen K, Nedić A, Srikant R. On projected stochastic gradient descent algorithm with weighted averaging for least squares regression. IEEE Trans Automat Contr. 2017;62(11):5974–5981.
- Cardot H, Cénac P, Godichon-Baggioni A. Online estimation of the geometric median in hilbert spaces: nonasymptotic confidence balls. Ann Stat. 2017;45(2):591–614.
- Cardot H, Cénac P, Zitt P-A. Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli. 2013;19(1):18–43.
- Godichon-Baggioni A. Estimating the geometric median in Hilbert spaces with stochastic gradient algorithms: Lp and almost sure rates of convergence. J Multivar Anal. 2016;146:209–222.
- Bercu B, Costa M, Gadat S. Stochastic approximation algorithms for superquantiles estimation; 2020. arXiv preprint arXiv:2007.14659.
- Costa M, Gadat S. Non asymptotic controls on a recursive superquantile approximation. Electron J Statist. 2021;15(2):4718–4769.
- Alfarra M, Hanzely S, Albasyoni A, et al. Adaptive learning of the optimal mini-batch size of sgd; 2020. arXiv preprint arXiv:2005.01097.
- Konečnỳ J, Liu J, Richtárik P, et al. Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J Sel Top Signal Process. 2015;10(2):242–255.
- Robbins H, Monro S. A stochastic approximation method. Ann Math Stat. 1951;22(3):400–407.
- Pelletier M. On the almost sure asymptotic behaviour of stochastic algorithms. Stoch Process Their Appl. 1998;78(2):217–244.
- Ruppert D. Efficient estimations from a slowly convergent robbins-monro process. Technical report. Cornell University Operations Research and Industrial Engineering; 1988.
- Polyak B, Juditsky A. Acceleration of stochastic approximation. SIAM J Control Optim. 1992;30(4):838–855.
- Pelletier M. Asymptotic almost sure efficiency of averaged stochastic algorithms. SIAM J Control Optim. 2000;39(1):49–72.
- Moulines E, Bach F. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Adv Neural Inf Process Syst. 2011;24.
- Gadat S, Panloup F. Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity; 2017. arXiv preprint arXiv:1709.03342.
- Godichon-Baggioni A. Online estimation of the asymptotic variance for averaged stochastic gradient algorithms. J Stat Plan Inference. 2019;203:1–19.
- Godichon-Baggioni A. Lp and almost sure rates of convergence of averaged stochastic gradient algorithms: locally strongly convex objective. ESAIM – Probab Stat. 2019;23:841–873.
- Défossez A, Bottou L, Bach F, et al. A simple convergence proof of adam and adagrad; 2020. arXiv preprint arXiv:2003.02395.
- Wang H, Gurbuzbalaban M, Zhu L, et al. Convergence rates of stochastic gradient descent under infinite noise variance. Adv Neural Inf Process Syst. 2021;34:18866–18877.
- Li CJ, Mou W, Wainwright M, et al. Root-sgd: sharp nonasymptotics and asymptotic efficiency in a single algorithm. In: Conference on learning theory, PMLR. 2022; p. 909–981.
- Chaudhuri P. Multivariate location estimation using extension of R-estimates through U-statistics type approach. Ann Statist. 1992;20(2):897–916.
- Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12(7):2121–2159.
- Boyer C, Godichon-Baggioni A. On the asymptotic rate of convergence of stochastic Newton algorithms and their weighted averaged versions; 2020. arXiv preprint arXiv:2011.09706.
- Mokkadem A, Pelletier M. A generalization of the averaging procedure: the use of two-time-scale algorithms. SIAM J Control Optim. 2011;49(4):1523–1543.