Search in:

Advanced search

Journal of the American Statistical Association Volume 117, 2022 - Issue 539

Submit an article Journal homepage

898

Views

CrossRef citations to date

Altmetric

Theory and Methods

Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks

Tengyuan Lianga Booth School of Business, University of Chicago, Chicago, IL; Correspondence[email protected]
View further author information

Hai Tran-Bachb Department of Statistics, University of Chicago, Chicago, ILView further author information

Pages 1324-1337 | Received 01 Jun 2020, Accepted 15 Nov 2020, Published online: 27 Jan 2021

Cite this article
https://doi.org/10.1080/01621459.2020.1853547
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Atkinson, K. E., and Han, W. (2012), Spherical Harmonics and Approximations on the Unit Sphere: An Introduction, Lecture Notes in Mathematics (Vol. 2044), Heidelberg: Springer.
Google Scholar
Bach, F. (2016), “Breaking the Curse of Dimensionality With Convex Neural Networks,” arXiv no. 1412.8690.
Google Scholar
Bartlett, P. L., Long, P. M., Lugosi, G., and Tsigler, A. (2019), “Benign Overfitting in Linear Regression,” arXiv no. 1906.11300.
Google Scholar
Belkin, M., Hsu, D., Ma, S., and Mandal, S. (2018), “Reconciling Modern Machine Learning and the Bias-Variance Trade-off,” arXiv no. 1812.11118.
Google Scholar
Belkin, M., Hsu, D., and Xu, J. (2019), “Two Models of Double Descent for Weak Features,” arXiv no. 1903.07571.
Google Scholar
Belkin, M., Ma, S., and Mandal, S. (2018), “To Understand Deep Learning We Need to Understand Kernel Learning,” arXiv no. 1802.01396.
Google Scholar
Belkin, M., Rakhlin, A., and Tsybakov, A. B. (2018), “Does Data Interpolation Contradict Statistical Optimality?,” arXiv no. 1806.09471.
Google Scholar
Caponnetto, A., and De Vito, E. (2007), “Optimal Rates for Regularized Least-Squares Algorithm,” Foundations of Computational Mathematics, 7, 331–368. DOI: 10.1007/s10208-006-0196-8.
Web of Science ®Google Scholar
Chizat, L., and Bach, F. (2018a), “A Note on Lazy Training in Supervised Differentiable Programming,” arXiv no. 1812.07956.
Google Scholar
Chizat, L., and Bach, F. (2018b), “On the Global Convergence of Gradient Descent for Over-Parameterized Models Using Optimal Transport,” arXiv no. 1805.09545.
Google Scholar
Cho, Y., and Saul, L. K. (2009), “Kernel Methods for Deep Learning,” in Advances in Neural Information Processing Systems (Vol. 22), eds. Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, Curran Associates, Inc., pp. 342–350.
Google Scholar
Daniely, A., Frostig, R., Gupta, V., and Singer, Y. (2017), “Random Features for Compositional Kernels,” arXiv no. 1703.07872.
Google Scholar
Daniely, A., Frostig, R., and Singer, Y. (2017), “Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity,” arXiv no. 1602.05897.
Google Scholar
Dou, X., and Liang, T. (2020), “Training Neural Networks as Learning Data-Adaptive Kernels: Provable Representation and Approximation Benefits,” Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1745812.
Web of Science ®Google Scholar
Du, S. S., Zhai, X., Poczos, B., and Singh, A. (2018), “Gradient Descent Provably Optimizes Over-Parameterized Neural Networks,” arXiv no. 1810.02054.
Google Scholar
Feldman, V. (2019), “Does Learning Require Memorization? A Short Tale About a Long Tail,” arXiv no. 1906.05271.
Google Scholar
Hastie, T., Montanari, A., Rosset, S., and Tibshirani, R. J. (2019), “Surprises in High-Dimensional Ridgeless Least Squares Interpolation,” arXiv no. 1903.08560.
Google Scholar
Jacot, A., Gabriel, F., and Hongler, C. (2019), “Freeze and Chaos for DNNs: An NTK View of Batch Normalization, Checkerboard and Boundary Effects,” arXiv no. 1907.05715.
Google Scholar
Kar, P., and Karnick, H. (2012), “Random Feature Maps for Dot Product Kernels,” arXiv no. 1201.6530.
Google Scholar
Kesten, H., and Stigum, B. P. (1966), “A Limit Theorem for Multidimensional Galton-Watson Processes,” The Annals of Mathematical Statistics, 37, 1211–1223, DOI: 10.1214/aoms/1177699266.
Google Scholar
Liang, T., and Rakhlin, A. (2020), “Just Interpolate: Kernel ‘Ridgeless’ Regression Can Generalize,” The Annals of Statistics, 48, 1329–1347, DOI: 10.1214/19-AOS1849.
Web of Science ®Google Scholar
Liang, T., Rakhlin, A., and Zhai, X. (2020), “On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels,” in Proceedings of 33rd Conference on Learning Theory, Proceedings of Machine Learning Research (Vol. 125), eds. J. Abernethy and S. Agarwal, PMLR, pp. 2683–2711.
Google Scholar
Liang, T., and Sur, P. (2020), “A Precise High-Dimensional Asymptotic Theory for Boosting and Min-l1-Norm Interpolated Classifiers,” arXiv no. 2002.01586.
Google Scholar
Lyons, R., and Peres, Y. (2016), Probability on Trees and Networks, Cambridge: Cambridge University Press.
Google Scholar
Mei, S., and Montanari, A. (2019), “The Generalization Error of Random Features Regression: Precise Asymptotics and Double Descent Curve,” arXiv no. 1908.05355.
Google Scholar
Mei, S., Montanari, A., and Nguyen, P.-M. (2018), “A Mean Field View of the Landscape of Two-Layers Neural Networks,” arXiv no. 1804.06561.
Google Scholar
Montanari, A., Ruan, F., Sohn, Y., and Yan, J. (2020), “The Generalization Error of Max-Margin Linear Classifiers: High-Dimensional Asymptotics in the Overparametrized Regime,” arXiv no. 1911.01544.
Google Scholar
Nakkiran, P., Venkat, P., Kakade, S., and Ma, T. (2020), “Optimal Regularization Can Mitigate Double Descent,” arXiv no. 2003.01897.
Google Scholar
Neal, R. M. (1996a), Bayesian Learning for Neural Networks, Lecture Notes in Statistics (Vol. 118), New York: Springer.
Google Scholar
Neal, R. M. (1996b), “Priors for Infinite Networks,” in Bayesian Learning for Neural Networks, Lecture Notes in Statistics, ed. R. M. Neal, New York: Springer, pp. 29–53.
Google Scholar
Nguyen, P.-M., and Pham, H. T. (2020), “A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks,” arXiv no. 2001.11443.
Google Scholar
Pennington, J., Xinnan, F., Yu, X., and Kumar, S. (2015), “Spherical Random Features for Polynomial Kernels,” in Advances in Neural Information Processing Systems (Vol. 28), eds. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Curran Associates, Inc., pp. 1846–1854.
Google Scholar
Poole, B., Lahiri, S., Raghu, M., Sohl-Dickstein, J., and Ganguli, S. (2016), “Exponential Expressivity in Deep Neural Networks Through Transient Chaos,” in Advances in Neural Information Processing Systems (Vol. 29), eds. D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Curran Associates, Inc., pp. 3360–3368.
Google Scholar
Rahimi, A., and Recht, B. (2008), “Random Features for Large-Scale Kernel Machines,” in Advances in Neural Information Processing Systems (Vol. 20), eds. J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Curran Associates, Inc., pp. 1177–1184.
Google Scholar
Rahimi, A., and Recht, B. (2009), “Weighted Sums of Random Kitchen Sinks: Replacing Minimization With Randomization in Learning,” in Advances in Neural Information Processing Systems (Vol. 21), Curran Associates, Inc., pp. 1313–1320.
Google Scholar
Rotskoff, G. M., and Vanden-Eijnden, E. (2018), “Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach,” arXiv no. 1805.00915.
Google Scholar
Schoenberg, I. J. (1942), “Positive Definite Functions on Spheres,” Duke Mathematical Journal, 9, 96–108, DOI: 10.1215/S0012-7094-42-00908-6.
Google Scholar
Shankar, V., Fang, A., Guo, W., Fridovich-Keil, S., Schmidt, L., Ragan-Kelley, J., and Recht, B. (2020), “Neural Kernels Without Tangents,” arXiv no. 2003.02237.
Google Scholar
Sirignano, J., and Spiliopoulos, K. (2018), “Mean Field Analysis of Neural Networks: A Law of Large Numbers,” SIAM Journal on Applied Mathematics, 80, 725–752. DOI: 10.1137/18M1192184.
Web of Science ®Google Scholar
Stitson, M., Gammerman, A., Vapnik, V., Vovk, V., Watkins, C., and Weston, J. (1999), “Support Vector Regression With ANOVA Decomposition Kernels,” in Advances in Kernel Methods: Support Vector Learning, pp. 285–291.
Google Scholar
Woodworth, B., Gunasekar, S., Savarese, P., Moroshko, E., Golan, I., Lee, J., Soudry, D., and Srebro, N. (2019), “Kernel and Rich Regimes in Overparametrized Models,” arXiv no. 1906.05827.
Google Scholar
Yang, G. (2019), “Scaling Limits of Wide Neural Networks With Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation,” arXiv no. 1902.04760.
Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2017), “Understanding Deep Learning Requires Rethinking Generalization,” arXiv no. 1611.03530.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date