References
- N. Agarwal, B. Bullins, and E. Hazan, Second-order stochastic optimization for machine learning in linear time, J. Mach. Learn. Res. 18(116) (2017), pp. 1–40.
- A.S. Berahas, R. Bollapragada, and J. Nocedal, An investigation of newton-sketch and subsampled newton methods: Supplementary materials. 2020.
- R. Bollapragada, R.H. Byrd, and J. Nocedal, Exact and inexact subsampled newton methods for optimization, IMA J. Numer. Anal. 39(2) (2018), pp. 545–578.
- L. Bottou, F.E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev. 60(2) (2018), pp. 223–311.
- P. Drineas, M.W. Mahoney, S. Muthukrishnan, and T. Sarlós, Faster least squares approximation, Numer. Math. 117(2) (2011), pp. 219–249.
- M.A. Erdogdu and A. Montanari, Convergence rates of sub-sampled newton methods. Advances in Neural Information Processing Systems, Montreal, Canada, Vol. 28, 2015, pp. 3034–3042.
- G.H. Golub and C.F. Van Loan, Matrix Computations, 2nd ed., Johns Hopkins University Press, Baltimore, MD, 1989.
- R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction. Advances in Neural Information Processing Systems, Lake Tahoe, NV, Vol. 26, 2013, pp. 315–323.
- F. Krahmer and R. Ward, New and improved johnson–lindenstrauss embeddings via the restricted isometry property, SIAM J. Math. Anal. 43(3) (2011), pp. 1269–1281.
- D.G. Luenberger, Linear and Nonlinear Programming, 2nd ed., Addison-Wesley, New Jersey, 1984.
- H. Luo, A. Agarwal, N. Cesa-Bianchi, and J. Langford, Efficient second order online learning by sketching. Advances in Neural Information Processing Systems, Barcelona, Spain, Vol. 29, 2016, pp. 902–910.
- Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Vol. 87, Springer Science & Business Media, Boston, MA, 2013.
- J. Nocedal and S. Wright, Numerical Optimization, 2nd ed., Springer, New York, 1999.
- M. Pilanci and M.J. Wainwright, Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence, SIAM. J. Optim. 27(1) (2017), pp. 205–245.
- H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Stat. 22 (1951), pp. 400–407.
- F. Roosta-Khorasani and M.W. Mahoney, Sub-sampled newton methods, Math. Program. 174(1–2) (2019), pp. 293–326.
- S. Sra, S. Nowozin, and S.J. Wright, Optimization for Machine Learning, MIT Press, Cambridge, MA, 2012.
- J. Wang, J.D. Lee, M. Mahdavi, M. Kolar, and N. Srebro, Sketching meets random projection in the dual: A provable recovery algorithm for big and high-dimensional data, Electron. J. Stat. 11(2) (2017), pp. 4896–4944.
- S. Wang, A. Gittens, and M.W. Mahoney, Sketched ridge regression: Optimization perspective, statistical perspective, and model averaging, J. Mach. Learn. Res. 18(218) (2018), pp. 1–50.
- D.P. Woodruff, Sketching as a tool for numerical linear algebra, Found. Trends
Theor. Comput. Sci. 10(1–2) (2014), pp. 1–157.
- S.J. Wright, Coordinate descent algorithms, Math. Program. 151(1) (2015), pp. 3–34.
- P. Xu, J. Yang, F. Roosta-Khorasani, C. Ré, and M.W. Mahoney, Sub-sampled newton methods with non-uniform sampling. Advances in Neural Information Processing Systems, Barcelona, Spain, Vol. 29, 2016, pp. 3000–3008.
- P. Xu, F. Roosta-Khorasan, and M.W. Mahoney, Newton-type methods for non-convex optimization under inexact hessian information, preprint (2017). Available at arXiv:1708.07164.
- P. Xu, F. Roosta-Khorasan, and M.W. Mahoney, Second-order optimization for non-convex machine learning: An empirical study, preprint (2017). Available at arXiv:1708.07827.
- Z. Yao, P. Xu, F. Roosta-Khorasani, and M.W. Mahoney, Inexact non-convex newton-type methods, preprint (2018). Available at arXiv:1802.06925.