References
- R. Bollapragada, R. Bydr, and J. Nocedal, Adaptive sampling strategies for stochastic optimization, SIAM J. Optim. 28(4) (2018), pp. 3312–3343.
- D. Brownstone, D.S. Bunch, and K. Train, Joint mixed logit models of stated and revealed preferences for alternative-fuel vehicles, Transport Res. B. 34 (2000), pp. 315–338.
- F. Bastin, C. Cirillo, and P.L. Toint, Convergence theory for nonconvex stochastic programming with an application to mixed logit, Math. Program. 108 (2006), pp. 207–234.
- J. Blanchet, D. Goldfarb, G. Iyengar, F. Li, and C. Zhou, Unbiased Simulation for Optimizing Stochastic Function Compositions, preprint, https://arxiv.org/abs/1711.07564, 2017.
- R.H. Byrd, S.L. Hansen, J. Nocedal, and Y. Singer, A stochastic quasi-Newton method for large-scale optimization, SIAM J. Optim. 26 (2016), pp. 1008–1031.
- B. Dai, N. He, Y. Pan, B. Boots, and L. Song, Learning from conditional distributions via dual embeddings, Artif. Intell. Stat. 2 (2017), pp. 1458–1467.
- E.D. Dolan and J.J. More, Benchmarking optimization software with performance profiles, Math. Profiles 91 (2002), pp. 201–213.
- D. Dentcheva, S. Penev, and A. Ruszczy n´ski, Statistical estimation of composite risk functionals and risk optimization problems, Ann. Inst. Stat. Math. 69 (2017), pp. 737–760.
- Y. M. Ermoliev, Methods of Stochastic Programming, Nauka, Moscow, 1976.
- S. Ghadimi, A. Ruszczy n´ski, and M. Wang, A single timescale stochastic approximation method for nested stochastic optimization, SIAM J. Optim. 30 (2020), pp. 960–979.
- R.M. Gower, D. Goldfarb, and P. Richt a´rik, Stochastic block BFGS: Squeezing more curvature out of data, in Proceedings of ICML, 2016.
- D.A. Hensher and W.H. Greene, The mixed logit model: the state of practice, Transportation 30 (2003), pp. 133–176.
- Y. Lecun, C. Cortes, and C.J. Burges, The MNIST Database of Handwritten Digits, http://yann.lecun.com/exdb/mnist, 2010.
- M. Lichman, UCI Machine Learning Repository, http://archive.ics.uci.edu/ml, 2013.
- A. Lucchi, B. McWilliams, and T. Hofmann, A Variance Reduced Stochastic Newton Method, preprint, arXiv:1503.08316, 2015.
- J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online dictionary learning for sparse coding, in Proceedings of ICML, 2009.
- S. Msmussen and P.W. Glynn, Stochastic Simulation: Algorithm and Analysis, Springer, New York, 2007. Stoch. Model. Appl. Probab. 57.
- A. Mokhtari and A. Ribeiro, RES: regularized stochastic BFGS algorithm, IEEE Trans. Signal Process.62 (2014), pp. 6089–6104.
- A. Mokhtari and A. Ribeiro, Global convergence of online limited memory BFGS, J. Mach. Learn. Res. 16 (2015), pp. 3151–3181.
- P. Moritz, R. Nishihara, and M.I. Jordax, A linearly-convergence stochastic L-BFGS algorithm, in Proceedings of AISTATS, 2016, PP. 249–258.
- I. Mukherjee, K. Canini, R. Frongillo, and Y. Singer, Parallel boosting with momentum, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Lecture Notes Comput. Sci. 8190, Springer, Berlin, 2013, pp. 17–32.
- A. Ruszczy n´ski, A linearization method for nonsmooth stochastic programming problems, Math. Oper. Res. 12 (1987), pp. 32–49.
- S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, Cambridge, UK, 2014.
- M. Wang, E.X. Fang, and B. Liu, Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions, Math. Program. 161 (2017), pp. 419–449.
- M. Wang, J. Liu, and E.X. Fang, Accelerating stochastic composition optimization, J. Mach. Learn. Res. 18 (2017), pp. 1–23.
- X. Wang, S. Ma, D. Gldfarb, and W. Liu, Stochastic quasi-Newton methods for nonconvex stochastic optimization, SIAM J. Optim. 27 (2017), pp. 927–956.
- S. Yang, M. Wang, and E.X. Fang, Multilevel stochastic gradient methods for nested composition optimization, SIAM J. Optim. 29 (2019), pp. 616–659.