2,192
Views
9
CrossRef citations to date
0
Altmetric
Theory and Methods

Online Covariance Matrix Estimation in Stochastic Gradient Descent

, &
Pages 393-404 | Received 12 Jun 2020, Accepted 17 May 2021, Published online: 21 Jul 2021

References

  • Bach, F. R., and Moulines, E. (2013), “Non-Strongly-Convex Smooth Stochastic Approximation With Convergence Rate O(1/n),” in Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 773–781.
  • Blum, J. R. (1954), “Approximation Methods Which Converge With Probability One,” Annals of Mathematical Statistics, 25, 382–386. DOI: 10.1214/aoms/1177728794.
  • Bottou, L. (1998), “Online Learning and Stochastic Approximations,” in Line Learning in Neural Networks, ed. David Saad, Cambridge: Cambridge University Press, pp. 9–42.
  • Chao, S.-K., and Cheng, G. (2019), “A Generalization of Regularized Dual Averaging and Its Dynamics,” arXiv no. 1909.10072.
  • Chen, X., Lee, J. D., Tong, X. T., and Zhang, Y. (2020), “Statistical Inference for Model Parameters in Stochastic Gradient Descent,” Annals of Statistics, 48, 251–273.
  • Duchi, J., Hazan, E., and Singer, Y. (2011), “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, 12, 2121–2159.
  • Dvoretzky, A. (1956), “On Stochastic Approximation,” in Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability.
  • Fabian, V. (1968), “On Asymptotic Normality in Stochastic Approximation,” Annals of Mathematical Statistics, 39, 1327–1332. DOI: 10.1214/aoms/1177698258.
  • Fang, Y. (2019), “Scalable Statistical Inference for Averaged Implicit Stochastic Gradient Descent,” Scandinavian Journal of Statistics, 46, 987–1002. DOI: 10.1111/sjos.12378.
  • Fang, Y., Xu, J., and Yang, L. (2018), “Online Bootstrap Confidence Intervals for the Stochastic Gradient Descent Estimator,” Journal of Machine Learning Research, 19, 3053–3073.
  • Flegal, J. M., and Jones, G. L. (2010), “Batch Means and Spectral Variance Estimators in Markov Chain Monte Carlo,” Annals of Statistics, 38, 1034–1070.
  • Glynn, P. W., and Whitt, W. (1991), “Estimating the Asymptotic Variance With Batch Means,” Operation Research Letters, 10, 431–435. DOI: 10.1016/0167-6377(91)90019-L.
  • Hazan, E., and Kale, S. (2014), “Beyond the Regret Minimization Barrier: Optimal Algorithms for Stochastic Strongly-Convex Optimization,” Journal of Machine Learning Research, 15, 2489–2512.
  • Hoffman, M., Bach, F. R., and Blei, D. M. (2010), “Online Learning for Latent Dirichlet Allocation,” in Proceedings of the 24th International Conference on Neural Information Processing Systems, pp. 451–459.
  • Jones, G. L., Haran, M., Caffo, B. S., and Neath, R. (2006), “Fixed-Width Output Analysis for Markov Chain Monte Carlo,” Journal of American Statistical Association, 101, 1537–1547. DOI: 10.1198/016214506000000492.
  • Kiefer, J., and Wolfowitz, J. (1952), “Stochastic Estimation of the Maximum of a Regression Function,” Annals Mathematical Statistics, 23, 462–466. DOI: 10.1214/aoms/1177729392.
  • Kim, Y. M., Lahiri, S. N., and Nordman, D. J. (2013), “A Progressive Block Empirical Likelihood Method for Time Series,” Journal of American Statistical Association, 108, 1506–1516. DOI: 10.1080/01621459.2013.847374.
  • Kingma, D. P., and Ba, J. (2015), “Adam: A Method for Stochastic Optimization,” in International Conference on Learning Representations.
  • Kitamura, Y. et al. (1997), “Empirical Likelihood Methods With Weakly Dependent Processes,” Annals of Statistics, 25, 2084–2102.
  • Lahiri, S. N. (1999), “Theoretical Comparisons of Block Bootstrap Methods,” Annals of Statistics, 27, 386–404.
  • Lahiri, S. N. (2003), Resampling Methods for Dependent Data, Springer Series in Statistics, New York: Springer-Verlag.
  • Lai, T. L. (2003), “Stochastic Approximation,” Annals of Statistics, 31, 391–406.
  • Liang, T., and Su, W. (2019), “Statistical Inference for the Population Landscape Via Moment-Adjusted Stochastic Gradients,” Journal of Royal Statistical Society, Series B, 81, 431–456. DOI: 10.1111/rssb.12313.
  • Ljung, L. (1977), “Analysis of Recursive Stochastic Algorithms,” IEEE Transactions on Automatic Control, 22, 551–575. DOI: 10.1109/TAC.1977.1101561.
  • Mairal, J., Bach, F. R., Ponce, J., and Sapiro, G. (2010), “Online Learning for Matrix Factorization and Sparse Coding,” Journal of Machine Learning Research, 11, 19–60.
  • McElroy, T., and Politis, D. N. (2007), “Computer-Intensive Rate Estimation, Diverging Statistics and Scanning,” Annals of Statistics, 35, 1827–1848.
  • Mou, W., Li, C. J., Wainwright, M. J., Bartlett, P. L., and Jordan, M. I. (2020), “On Linear Stochastic Approximation: Fine-Grained Polyak-Ruppert and Non-Asymptotic Concentration,” in Proceedings of Thirty Third Conference on Learning Theory, pp. 2947–2997.
  • Moulines, E., and Bach, F. R. (2011), “Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning,” in Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 856–864.
  • Nordman, D. J., Bunzel, H., and Lahiri, S. N. (2013), “A Nonstandard Empirical Likelihood for Time Series,” Annals of Statistics, 41, 3050–3073.
  • Politis, D. N., Romano, J. P., and Wolf, M. (1999), Subsampling, Springer Series in Statistics, New York: Springer-Verlag.
  • Polyak, B. T., and Juditsky, A. B. (1992), “Acceleration of Stochastic Approximation by Averaging,” SIAM Journal of Control Optimization, 30, 838–855. DOI: 10.1137/0330046.
  • Rakhlin, A., Shamir, O., and Sridharan, K. (2012), “Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization,” in Proceedings of the 29th International Conference on Machine Learning, pp. 1571–1578.
  • Richardson, M., Dominowska, E., and Ragno, R. (2007), “Predicting Clicks: Estimating the Click-Through Rate for New Ads,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 665–674.
  • Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” Annals of Mathematical Statistics, 22, 400–407. DOI: 10.1214/aoms/1177729586.
  • Robbins, H., and Siegmund, D. (1971), “A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications,” in Optimizing Methods in Statistics, eds. Jagdish S. Rustagi, Academic Press, pp. 233–257.
  • Ruppert, D. (1988), “Efficient Estimations From a Slowly Convergent Robbins-Monro Process,” Technical report, Cornell University Operations Research and Industrial Engineering.
  • Sacks, J. (1958), “Asymptotic Distribution of Stochastic Approximation Procedures,” Annals of Mathematical Statistics, 29, 373–405. DOI: 10.1214/aoms/1177706619.
  • Shamir, O., and Zhang, T. (2013), “Stochastic Gradient Descent for Non-Smooth Optimization: Convergence Results and Optimal Averaging Schemes,” in Proceedings of the 30th International Conference on Machine Learning, pp. 71–79.
  • Su, W., and Zhu, Y. (2018), “Uncertainty Quantification for Online Learning and Stochastic Approximation Via Hierarchical Incremental Gradient Descent,” arXiv:1802.04876.
  • Toulis, P., and Airoldi, E. M. (2017), “Asymptotic and Finite-Sample Properties of Estimators Based on Stochastic Gradients,” Annals of Statistics, 45, 1694–1727.
  • Vats, D., Flegal, J. M., and Jones, G. L. (2019), “Multivariate Output Analysis for Markov Chain Monte Carlo,” Biometrika, 106, 321–337. DOI: 10.1093/biomet/asz002.
  • Wu, W. B. (2009), “Recursive Estimation of Time-Average Variance Constants,” Annals of Applied Probability, 19, 1529–1552.
  • Zhang, W., Zhou, T., Wang, J., and Xu, J. (2016), “Bid-Aware Gradient Descent for Unbiased Learning With Censored Data in Display Advertising,” in Proceedings of the 16th International Conference on World Wide Web, pp. 521–530. DOI: 10.1145/2939672.2939713.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.