3,343
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Benign Overfitting and Noisy Features

, & ORCID Icon
Pages 2876-2888 | Received 08 Apr 2021, Accepted 14 Jun 2022, Published online: 07 Sep 2022

References

  • Adlam, B., and Pennington, J. (2020), “The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization,” in International Conference on Machine Learning, pp. 74–84. PMLR.
  • Avron, H., Kapralov, M., Musco, C., Musco, C., Velingker, A., and Zandieh, A. (2017), “Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees,” in International Conference on Machine Learning, pp. 253–262.
  • Bach, F. (2017), “On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions,” Journal of Machine Learning Research, 18, 1–38.
  • Bartlett, P. L., Bousquet, O., Mendelson, S. (2005), “Local Rademacher Complexities,” The Annals of Statistics, 33, 1497–1537. DOI: 10.1214/009053605000000282.
  • Bartlett, P. L., Long, P. M., Lugosi, G., and Tsigler, A. (2020), “Benign Overfitting in Linear Regression,” in Proceedings of the National Academy of Sciences. DOI: 10.1073/pnas.1907378117.
  • Belkin, M., Hsu, D., Ma, S., and Mandal, S. (2019), “Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-off,” Proceedings of the National Academy of Sciences, 116, 15849–15854.
  • Belkin, M., Hsu, D., and Xu, J. (2019), “Two Models of Double Descent for Weak Features,” arXiv preprint arXiv:1903.07571.
  • Belkin, M., Ma, S., and Mandal, S. (2018), “To Understand Deep Learning We Need to Understand Kernel Learning,” in International Conference on Machine Learning, pp. 541–549.
  • Bochner, S. (1932), “Vorlesungen über Fouriersche Integrale,” in Akademische Verlagsgesellschaft.
  • Bunea, F., Strimas-Mackey, S., and Wegkamp, M. (2020), “Interpolation under Latent Factor Regression Models,” arXiv preprint arXiv: 2002.02525.
  • Caponnetto, A., and De Vito, E. (2007), “Optimal Rates for the Regularized Least-Squares Algorithm,” Foundations of Computational Mathematics, 7, 331–368. DOI: 10.1007/s10208-006-0196-8.
  • Du, S., Lee, J., Li, H., Wang, L., and Zhai, X. (2019), “Gradient Descent Finds Global Minima of Deep Neural Networks,” in International Conference on Machine Learning, pp. 1675–1685. PMLR.
  • Friedman, J., Hastie, T., and Tibshirani, R. (2001), The Elements of Statistical Learning, Springer Series in Statistics (Vol. 1). Berlin: Springer.
  • Hastie, T., Montanari, A., Rosset, S., and Tibshirani, R. J. (2019), “Surprises in High-Dimensional Ridgeless Least Squares Interpolation,” arXiv preprint arXiv:1903.08560.
  • Jacot, A., Gabriel, F., and Hongler, C. (2018), “Neural Tangent Kernel: Convergence and Generalization in Neural Networks,” in Advances in Neural Information Processing Systems, pp. 8571–8580.
  • Kanagawa, M., Hennig, P., Sejdinovic, D., and Sriperumbudur, B. K. (2018), “Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences,” arXiv preprint arXiv:1807.02582.
  • Li, Z., Ton, J.-F., Oglic, D., and Sejdinovic, D. (2019), “Towards a Unified Analysis of Random Fourier Features,” in International Conference on Machine Learning, pp. 3905–3914.
  • Li, Z., Ton, J.-F., Oglic, D., and Sejdinovic, D. (2021), “Towards a Unified Analysis of Random Fourier Features,” Journal of Machine Learning Research, 22, 1–51.
  • Liang, T., and Rakhlin, A. (2020), “Just Interpolate: Kernel “Ridgeless” Regression can Generalize,” Annals of Statistics, 48, 1329–1347.
  • Liang, T., Rakhlin, A., and Zhai, X. (2020), “On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels,” in Conference on Learning Theory, pp. 2683–2711. PMLR.
  • Liao, Z., Couillet, R., and Mahoney, M. W. (2020), “A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian Kernel, a Precise Phase Transition, and the Corresponding Double Descent,” arXiv preprint arXiv:2006.05013.
  • Mei, S., and Montanari, A. (2019), “The Generalization Error of Random Features Regression: Precise Asymptotics and Double Descent Curve,” arXiv preprint arXiv:1908.05355.
  • Mei, S., Montanari, A., and Nguyen, P.-M. (2018), “A Mean Field View of the Landscape of Two-Layer Neural Networks,” Proceedings of the National Academy of Sciences, 115, E7665–E7671.
  • Rahimi, A., and Recht, B. (2007), “Random Features for Large-Scale Kernel Machines,” in Advances in Neural Information Processing Systems, pp. 1177–1184.
  • Rudi, A., and Rosasco, L. (2017), “Generalization Properties of Learning with Random Features,” in Advances in Neural Information Processing Systems, pp. 3218–3228.
  • Sirignano, J., and Spiliopoulos, K. (2020), “Mean Field Analysis of Neural Networks: A Central Limit Theorem,” Stochastic Processes and their Applications, 130, 1820–1852. DOI: 10.1016/j.spa.2019.06.003.
  • Steinwart, I. (2019), “Convergence Types and Rates in Generic Karhunen-loève Expansions with Applications to Sample Path Properties,” Potential Analysis, 51, 361–395. DOI: 10.1007/s11118-018-9715-5.
  • Steinwart, I., and Christmann, A. (2008), Support Vector Machines, New York: Springer.
  • Suzuki, T. (2020), “Generalization Bound of Globally Optimal Non-convex Neural Network Training: Transportation Map Estimation by Infinite Dimensional Langevin Dynamics,” arXiv preprint arXiv:2007.05824.
  • Wahba, G. (1990), Spline Models for Observational Data (Vol. 59), Philadelphia, PA: SIAM.
  • Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2016), “Understanding Deep Learning Requires Rethinking Generalization,” arXiv preprint arXiv:1611.03530.