898
Views
1
CrossRef citations to date
0
Altmetric
Theory and Methods

Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks

&
Pages 1324-1337 | Received 01 Jun 2020, Accepted 15 Nov 2020, Published online: 27 Jan 2021

References

  • Atkinson, K. E., and Han, W. (2012), Spherical Harmonics and Approximations on the Unit Sphere: An Introduction, Lecture Notes in Mathematics (Vol. 2044), Heidelberg: Springer.
  • Bach, F. (2016), “Breaking the Curse of Dimensionality With Convex Neural Networks,” arXiv no. 1412.8690.
  • Bartlett, P. L., Long, P. M., Lugosi, G., and Tsigler, A. (2019), “Benign Overfitting in Linear Regression,” arXiv no. 1906.11300.
  • Belkin, M., Hsu, D., Ma, S., and Mandal, S. (2018), “Reconciling Modern Machine Learning and the Bias-Variance Trade-off,” arXiv no. 1812.11118.
  • Belkin, M., Hsu, D., and Xu, J. (2019), “Two Models of Double Descent for Weak Features,” arXiv no. 1903.07571.
  • Belkin, M., Ma, S., and Mandal, S. (2018), “To Understand Deep Learning We Need to Understand Kernel Learning,” arXiv no. 1802.01396.
  • Belkin, M., Rakhlin, A., and Tsybakov, A. B. (2018), “Does Data Interpolation Contradict Statistical Optimality?,” arXiv no. 1806.09471.
  • Caponnetto, A., and De Vito, E. (2007), “Optimal Rates for Regularized Least-Squares Algorithm,” Foundations of Computational Mathematics, 7, 331–368. DOI: 10.1007/s10208-006-0196-8.
  • Chizat, L., and Bach, F. (2018a), “A Note on Lazy Training in Supervised Differentiable Programming,” arXiv no. 1812.07956.
  • Chizat, L., and Bach, F. (2018b), “On the Global Convergence of Gradient Descent for Over-Parameterized Models Using Optimal Transport,” arXiv no. 1805.09545.
  • Cho, Y., and Saul, L. K. (2009), “Kernel Methods for Deep Learning,” in Advances in Neural Information Processing Systems (Vol. 22), eds. Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, Curran Associates, Inc., pp. 342–350.
  • Daniely, A., Frostig, R., Gupta, V., and Singer, Y. (2017), “Random Features for Compositional Kernels,” arXiv no. 1703.07872.
  • Daniely, A., Frostig, R., and Singer, Y. (2017), “Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity,” arXiv no. 1602.05897.
  • Dou, X., and Liang, T. (2020), “Training Neural Networks as Learning Data-Adaptive Kernels: Provable Representation and Approximation Benefits,” Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1745812.
  • Du, S. S., Zhai, X., Poczos, B., and Singh, A. (2018), “Gradient Descent Provably Optimizes Over-Parameterized Neural Networks,” arXiv no. 1810.02054.
  • Feldman, V. (2019), “Does Learning Require Memorization? A Short Tale About a Long Tail,” arXiv no. 1906.05271.
  • Hastie, T., Montanari, A., Rosset, S., and Tibshirani, R. J. (2019), “Surprises in High-Dimensional Ridgeless Least Squares Interpolation,” arXiv no. 1903.08560.
  • Jacot, A., Gabriel, F., and Hongler, C. (2019), “Freeze and Chaos for DNNs: An NTK View of Batch Normalization, Checkerboard and Boundary Effects,” arXiv no. 1907.05715.
  • Kar, P., and Karnick, H. (2012), “Random Feature Maps for Dot Product Kernels,” arXiv no. 1201.6530.
  • Kesten, H., and Stigum, B. P. (1966), “A Limit Theorem for Multidimensional Galton-Watson Processes,” The Annals of Mathematical Statistics, 37, 1211–1223, DOI: 10.1214/aoms/1177699266.
  • Liang, T., and Rakhlin, A. (2020), “Just Interpolate: Kernel ‘Ridgeless’ Regression Can Generalize,” The Annals of Statistics, 48, 1329–1347, DOI: 10.1214/19-AOS1849.
  • Liang, T., Rakhlin, A., and Zhai, X. (2020), “On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels,” in Proceedings of 33rd Conference on Learning Theory, Proceedings of Machine Learning Research (Vol. 125), eds. J. Abernethy and S. Agarwal, PMLR, pp. 2683–2711.
  • Liang, T., and Sur, P. (2020), “A Precise High-Dimensional Asymptotic Theory for Boosting and Min-l1-Norm Interpolated Classifiers,” arXiv no. 2002.01586.
  • Lyons, R., and Peres, Y. (2016), Probability on Trees and Networks, Cambridge: Cambridge University Press.
  • Mei, S., and Montanari, A. (2019), “The Generalization Error of Random Features Regression: Precise Asymptotics and Double Descent Curve,” arXiv no. 1908.05355.
  • Mei, S., Montanari, A., and Nguyen, P.-M. (2018), “A Mean Field View of the Landscape of Two-Layers Neural Networks,” arXiv no. 1804.06561.
  • Montanari, A., Ruan, F., Sohn, Y., and Yan, J. (2020), “The Generalization Error of Max-Margin Linear Classifiers: High-Dimensional Asymptotics in the Overparametrized Regime,” arXiv no. 1911.01544.
  • Nakkiran, P., Venkat, P., Kakade, S., and Ma, T. (2020), “Optimal Regularization Can Mitigate Double Descent,” arXiv no. 2003.01897.
  • Neal, R. M. (1996a), Bayesian Learning for Neural Networks, Lecture Notes in Statistics (Vol. 118), New York: Springer.
  • Neal, R. M. (1996b), “Priors for Infinite Networks,” in Bayesian Learning for Neural Networks, Lecture Notes in Statistics, ed. R. M. Neal, New York: Springer, pp. 29–53.
  • Nguyen, P.-M., and Pham, H. T. (2020), “A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks,” arXiv no. 2001.11443.
  • Pennington, J., Xinnan, F., Yu, X., and Kumar, S. (2015), “Spherical Random Features for Polynomial Kernels,” in Advances in Neural Information Processing Systems (Vol. 28), eds. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Curran Associates, Inc., pp. 1846–1854.
  • Poole, B., Lahiri, S., Raghu, M., Sohl-Dickstein, J., and Ganguli, S. (2016), “Exponential Expressivity in Deep Neural Networks Through Transient Chaos,” in Advances in Neural Information Processing Systems (Vol. 29), eds. D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Curran Associates, Inc., pp. 3360–3368.
  • Rahimi, A., and Recht, B. (2008), “Random Features for Large-Scale Kernel Machines,” in Advances in Neural Information Processing Systems (Vol. 20), eds. J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Curran Associates, Inc., pp. 1177–1184.
  • Rahimi, A., and Recht, B. (2009), “Weighted Sums of Random Kitchen Sinks: Replacing Minimization With Randomization in Learning,” in Advances in Neural Information Processing Systems (Vol. 21), Curran Associates, Inc., pp. 1313–1320.
  • Rotskoff, G. M., and Vanden-Eijnden, E. (2018), “Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach,” arXiv no. 1805.00915.
  • Schoenberg, I. J. (1942), “Positive Definite Functions on Spheres,” Duke Mathematical Journal, 9, 96–108, DOI: 10.1215/S0012-7094-42-00908-6.
  • Shankar, V., Fang, A., Guo, W., Fridovich-Keil, S., Schmidt, L., Ragan-Kelley, J., and Recht, B. (2020), “Neural Kernels Without Tangents,” arXiv no. 2003.02237.
  • Sirignano, J., and Spiliopoulos, K. (2018), “Mean Field Analysis of Neural Networks: A Law of Large Numbers,” SIAM Journal on Applied Mathematics, 80, 725–752. DOI: 10.1137/18M1192184.
  • Stitson, M., Gammerman, A., Vapnik, V., Vovk, V., Watkins, C., and Weston, J. (1999), “Support Vector Regression With ANOVA Decomposition Kernels,” in Advances in Kernel Methods: Support Vector Learning, pp. 285–291.
  • Woodworth, B., Gunasekar, S., Savarese, P., Moroshko, E., Golan, I., Lee, J., Soudry, D., and Srebro, N. (2019), “Kernel and Rich Regimes in Overparametrized Models,” arXiv no. 1906.05827.
  • Yang, G. (2019), “Scaling Limits of Wide Neural Networks With Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation,” arXiv no. 1902.04760.
  • Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2017), “Understanding Deep Learning Requires Rethinking Generalization,” arXiv no. 1611.03530.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.