Search in:

Advanced search

Journal of the American Statistical Association Volume 117, 2022 - Issue 540

Submit an article Journal homepage

2,504

Views

CrossRef citations to date

Altmetric

Theory and Methods

Consistent Sparse Deep Learning: Theory and Computation

Yan SunDepartment of Statistics, Purdue University, West Lafayette, INView further author information

Qifan SongDepartment of Statistics, Purdue University, West Lafayette, INView further author information

Faming LiangDepartment of Statistics, Purdue University, West Lafayette, INCorrespondence[email protected]
View further author information

Pages 1981-1995 | Received 20 Oct 2019, Accepted 20 Feb 2021, Published online: 20 Apr 2021

Cite this article
https://doi.org/10.1080/01621459.2021.1895175
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Alvarez, J. M., and Salzmann, M. (2016), “Learning the Number of Neurons in Deep Networks,” in Advances in Neural Information Processing Systems, pp. 2270–2278.
Google Scholar
Bauler, B., and Kohler, M. (2019), “On Deep Learning as a Remedy for the Curse of Dimensionality in Nonparametric Regression,” The Annals of Statistics, 47, 2261–2285.
Web of Science ®Google Scholar
Blundell, C., Cornebise, J., Kavukcuoglu, K., and Wierstra, D. (2015), “Weight Uncertainty in Neural Networks,” in Proceedings of the 32nd International Conference on Machine Learning, ICML’15 (Vol. 37), JMLR.org, pp. 1613–1622.
Google Scholar
Bölcskei, H., Grohs, P., Kutyniok, G., and Petersen, P. (2019), “Optimal Approximation With Sparsely Connected Deep Neural Networks,” SIAM Journal on Mathematics of Data Science, 1, 8–45. DOI: 10.1137/18M118709X.
Google Scholar
Chaudhari, P., Choromanska, A., Soatto, S., LeCun, Y., Baldassi, C., Borgs, C., Chayes, J., Sagun, L., and Zecchina, R. (2016), “Entropy-SGD: Biasing Gradient Descent Into Wide Valleys,” arXiv no. 1611.01838.
Google Scholar
Cheng, Y., Yu, F. X., Feris, R. S., Kumar, S., Choudhary, A. N., and Chang, S.-F. (2015), “An Exploration of Parameter Redundancy in Deep Networks With Circulant Projections,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2857–2865. DOI: 10.1109/ICCV.2015.327.
Google Scholar
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., and de Freitas, N. (2013), “Predicting Parameters in Deep Learning,” in NIPS.
Google Scholar
Dettmers, T., and Zettlemoyer, L. (2019), “Sparse Networks From Scratch: Faster Training Without Losing Performance,” arXiv no. 1907.04840.
Google Scholar
Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G., and West, M. (2004), “Sparse Graphical Models for Exploring Gene Expression Data,” Journal of Multivariate Analysis, 90, 196–212. DOI: 10.1016/j.jmva.2004.02.009.
Web of Science ®Google Scholar
Feng, J., and Simon, N. (2017), “Sparse-Input Neural Networks for High-Dimensional Nonparametric Regression and Classification,” arXiv no. 1711.07592.
Google Scholar
Frankle, J., and Carbin, M. (2018), “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks,” arXiv no. 1803.03635.
Google Scholar
George, E. I., and McCulloch, R. E. (1993), “Variable Selection via Gibbs Sampling,” Journal of the American Statistical Association, 88, 881–889. DOI: 10.1080/01621459.1993.10476353.
Web of Science ®Google Scholar
George, E. I., and McCulloch, R. E. (1997), “Approaches for Bayesian Variable Selection,” Statistica Sinica, 7, 339–373.
Web of Science ®Google Scholar
Ghosal, S., Ghosh, J. K., and Van Der Vaart, A. W. (2000), “Convergence Rates of Posterior Distributions,” The Annals of Statistics, 28, 500–531. DOI: 10.1214/aos/1016218228.
Web of Science ®Google Scholar
Ghosh, S., and Doshi-Velez, F. (2017), “Model Selection in Bayesian Neural Networks via Horseshoe Priors,” arXiv no. 1705.10388.
Google Scholar
Glorot, X., and Bengio, Y. (2010), “Understanding the Difficulty of Training Deep Feedforward Neural Networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256.
Google Scholar
Glorot, X., Bordes, A., and Bengio, Y. (2011), “Deep Sparse Rectifier Neural Networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323.
Google Scholar
Gomez, A. N., Zhang, I., Kamalakara, S. R., Madaan, D., Swersky, K., Gal, Y., and Hinton, G. E. (2019), “Learning Sparse Networks Using Targeted Dropout,” arXiv no. 1905.13678.
Google Scholar
Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. (2017), “On Calibration of Modern Neural Networks,” in Proceedings of the 34th International Conference on Machine Learning, ICML’17 (Vol. 70), JMLR.org, pp. 1321–1330.
Google Scholar
Han, S., Mao, H., and Dally, W. J. (2015), “Deep Compression: Compressing Deep Neural Networks With Pruning, Trained Quantization and Huffman Coding,” arXiv no. 1510.00149.
Google Scholar
Han, S., Pool, J., Tran, J., and Dally, W. (2015), “Learning Both Weights and Connections for Efficient Neural Network,” in Advances in Neural Information Processing Systems, pp. 1135–1143.
Google Scholar
He, K., Zhang, X., Ren, S., and Sun, J. (2015), “Delving Deep Into Rectifiers: Surpassing Human-Level Performance on imagenet Classification,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034.
Google Scholar
He, K., Zhang, X., Ren, S., and Sun, J. (2016), “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
Google Scholar
Ishwaran, H., and Rao, J. S. (2005), “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies,” The Annals of Statistics, 33, 730–773. DOI: 10.1214/009053604000001147.
Web of Science ®Google Scholar
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., and Wilson, A. G. (2018), “Averaging Weights Leads to Wider Optima and Better Generalization,” arXiv no. 1803.05407.
Google Scholar
Jiang, W. (2007), “Bayesian Variable Selection for High Dimensional Generalized Linear Models: Convergence Rate of the Fitted Densities,” The Annals of Statistics, 35, 1487–1511. DOI: 10.1214/009053607000000019.
Web of Science ®Google Scholar
Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L. (1999), “Introduction to Variational Methods for Graphical Models,” Machine Learning, 37, 183–233. DOI: 10.1023/A:1007665907178.
Web of Science ®Google Scholar
Kass, R., Tierney, L., and Kadane, J. (1990), “The Validity of Posterior Expansions Based on Laplace’s Method,” in Bayesian and Likelihood Methods in Statistics and Econometrics, eds. S. Geisser, J. Hodges, S. Press, and A. Zellner, Amsterdam: North-Holland (Elsevier Science Publisher B.V.), pp. 473–488.
Google Scholar
Kingma, D. P., and Ba, J. (2014), “Adam: A Method for Stochastic Optimization,” arXiv no. 1412.6980.
Google Scholar
Kleinberg, R., Li, Y., and Yuan, Y. (2018), “An Alternative View: When Does SGD Escape Local Minima?” in Proceedings of the 35th International Conference on Machine Learning, ICML’18 (Vol. 70), JMLR.org.
Google Scholar
Kohn, R., Smith, M., and Chan, D. (2001), “Nonparametric Regression Using Linear Combinations of Basis Functions,” Statistics and Computing, 11, 313–322. DOI: 10.1023/A:1011916902934.
Web of Science ®Google Scholar
Krizhevsky, A. and Hinton, G. (2009), “Learning Multiple Layers of Features From Tiny Images,” Tech. Rep., Citeseer.
Google Scholar
Liang, F. (2005), “Evidence Evaluation for Bayesian Neural Networks Using Contour Monte Carlo,” Neural Computation, 17, 1385–1410. DOI: 10.1162/0899766053630323.
Web of Science ®Google Scholar
Liang, F., Li, Q., and Zhou, L. (2018), “Bayesian Neural Networks for Selection of Drug Sensitive Genes,” Journal of the American Statistical Association, 113, 955–972. DOI: 10.1080/01621459.2017.1409122.
PubMed Web of Science ®Google Scholar
Liang, F., Song, Q., and Yu, K. (2013), “Bayesian Subset Modeling for High Dimensional Generalized Linear Models,” Journal of the American Statistical Association, 108, 589–606. DOI: 10.1080/01621459.2012.761942.
Web of Science ®Google Scholar
Lin, T., Stich, S. U., Barba, L., Dmitriev, D., and Jaggi, M. (2020), “Dynamic Model Pruning With Feedback,” in International Conference on Learning Representations (ICLR).
Google Scholar
Liu, B., Wang, M., Foroosh, H., Tappen, M., and Pensky, M. (2015), “Sparse Convolutional Neural Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 806–814.
Google Scholar
Louizos, C., Ullrich, K., and Welling, M. (2017), “Bayesian Compression for Deep Learning,” in Advances in Neural Information Processing Systems, pp. 3288–3298.
Google Scholar
Ma, R., Miao, J., Niu, L., and Zhang, P. (2019), “Transformed l1 Regularization for Learning Sparse Deep Neural Networks,” arXiv no. 1901.01021v1.
Google Scholar
MacKay, D. J. (1992), “The Evidence Framework Applied to Classification Networks,” Neural Computation, 4, 720–736. DOI: 10.1162/neco.1992.4.5.720.
Web of Science ®Google Scholar
McAllester, D. (1999a), “PAC-Bayesian Model Averaging,” in Proceedings of the 12th Annual Conference on Computational Learning Theory, pp. 164–170.
Google Scholar
McAllester, D. (1999b), “Some PAC-Bayesian Theorems,” Machine Learning, 37, 335–363.
Web of Science ®Google Scholar
Mhaskar, H., Liao, Q., and Poggio, T. (2017), “When and Why Are Deep Networks Better Than Shallow Ones?” in Thirty-First AAAI Conference on Artificial Intelligence.
Google Scholar
Mnih, A., and Gregor, K. (2014), “Neural Variational Inference and Learning in Belief Networks,” in Proceedings of the 31st International Conference on International Conference on Machine Learning, ICML’14 (Vol. 32), JMLR.org, pp. II-1791–II-1799.
Google Scholar
Mocanu, D. C., Mocanu, E., Stone, P., Nguyen, P. H., Gibescu, M., and Liotta, A. (2018), “Scalable Training of Artificial Neural Networks With Adaptive Sparse Connectivity Inspired by Network Science,” Nature Communications, 9, 2383. DOI: 10.1038/s41467-018-04316-3.
PubMed Web of Science ®Google Scholar
Montufar, G. F., Pascanu, R., Cho, K., and Bengio, Y. (2014), “On the Number of Linear Regions of Deep Neural Networks,” in Advances in Neural Information Processing Systems, pp. 2924–2932.
Google Scholar
Mostafa, H., and Wang, X. (2019), “Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization,” arXiv no. 1902.05967.
Google Scholar
Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., and Sutskever, I. (2020), “Deep Double Descent: Where Bigger Models and More Data Hurt,” in International Conference on Learning Representations.
Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2017), “Automatic Differentiation in PyTorch,” in NIPS 2017 Autodiff Workshop: The Future of Gradient-based Machine Learning Software and Techniques, Long Beach, CA.
Google Scholar
Petersen, P., and Voigtlaender, F. (2018), “Optimal Approximation of Piecewise Smooth Functions Using Deep ReLU Neural Networks,” Neural Networks, 108, 296–330. DOI: 10.1016/j.neunet.2018.08.019.
PubMed Web of Science ®Google Scholar
Polson, N. G., and Ročková, V. (2018), “Posterior Concentration for Sparse Deep Learning,” in Proceedings of the 32nd International Conferences on Neural Information Processing Systems (NeurIPS).
Google Scholar
Pourzanjani, A. A., Jiang, R. M., and Petzold, L. R. (2017), “Improving the Identifiability of Neural Networks for Bayesian Inference,” in NIPS Workshop on Bayesian Deep Learning.
Google Scholar
Ročková, V. (2018), “Bayesian Estimation of Sparse Signals With a Continuous Spike-and-Slab Prior,” The Annals of Statistics, 46, 401–437. DOI: 10.1214/17-AOS1554.
Web of Science ®Google Scholar
Scardapane, S., Comminiello, D., Hussain, A., and Uncini, A. (2017), “Group Sparse Regularization for Deep Neural Networks,” Neurocomputing, 241, 81–89. DOI: 10.1016/j.neucom.2017.02.029.
Web of Science ®Google Scholar
Schmidt-Hieber, J. (2017), “Nonparametric Regression Using Deep Neural Networks With ReLU Activation Function,” arXiv no. 1708.06633v2.
Google Scholar
Simonyan, K., and Zisserman, A. (2014), “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv no. 1409.1556.
Google Scholar
Song, Q., and Liang, F. (2017), “Nearly Optimal Bayesian Shrinkage for High Dimensional Regression,” arXiv no. 1712.08964.
Google Scholar
Song, Q., Sun, Y., Ye, M., and Liang, F. (2020), “Extended Stochastic Gradient MCMC Algorithms for Large-Scale Bayesian Computing,” arXiv no. 2002.02919v1.
Google Scholar
Song, Q., Wu, M., and Liang, F. (2014), “Weak Convergence Rates of Population Versus Single-Chain Stochastic Approximation MCMC Algorithms,” Advances in Applied Probability, 46, 1059–1083. DOI: 10.1239/aap/1418396243.
Web of Science ®Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014), “Dropout: A Simple Way to Prevent Neural Networks From Overfitting,” The Journal of Machine Learning Research, 15, 1929–1958.
Web of Science ®Google Scholar
Telgarsky, M. (2017), “Neural Networks and Rational Functions,” arXiv no. 1706.03301.
Google Scholar
Wager, S., Wang, S., and Liang, P. S. (2013), “Dropout Training as Adaptive Regularization,” in Advances in Neural Information Processing Systems (NIPS), pp. 351–359.
Google Scholar
Yarotsky, D. (2017), “Error Bounds for Approximations With Deep ReLU Networks,” Neural Networks, 94, 103–114. DOI: 10.1016/j.neunet.2017.07.002.
PubMed Web of Science ®Google Scholar
Yoon, J., and Hwang, S. J. (2017), “Combined Group and Exclusive Sparsity for Deep Neural Networks,” in Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research (Vol. 70), eds. D. Precup and Y. W. Teh, International Convention Centre, Sydney, Australia, pp. 3958–3966.
Google Scholar
Zhang, C., Liao, Q., Rakhlin, A., Miranda, B., Golowich, N., and Poggio, T. (2018), “Theory of Deep Learning IIb: Optimization Properties of SGD,” arXiv no. 1801.02254.
Google Scholar
Zhong, Z., Zheng, L., Kang, G., Li, S., and Yang, Y. (2017), “Random Erasing Data Augmentation,” arXiv no. 1708.04896.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Consistent Sparse Deep Learning: Theory and Computation

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Consistent Sparse Deep Learning: Theory and Computation

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date