2,504
Views
3
CrossRef citations to date
0
Altmetric
Theory and Methods

Consistent Sparse Deep Learning: Theory and Computation

, &
Pages 1981-1995 | Received 20 Oct 2019, Accepted 20 Feb 2021, Published online: 20 Apr 2021
 

Abstract

Deep learning has been the engine powering many successes of data science. However, the deep neural network (DNN), as the basic model of deep learning, is often excessively over-parameterized, causing many difficulties in training, prediction and interpretation. We propose a frequentist-like method for learning sparse DNNs and justify its consistency under the Bayesian framework: the proposed method could learn a sparse DNN with at most O(n/log(n)) connections and nice theoretical guarantees such as posterior consistency, variable selection consistency and asymptotically optimal generalization bounds. In particular, we establish posterior consistency for the sparse DNN with a mixture Gaussian prior, show that the structure of the sparse DNN can be consistently determined using a Laplace approximation-based marginal posterior inclusion probability approach, and use Bayesian evidence to elicit sparse DNNs learned by an optimization method such as stochastic gradient descent in multiple runs with different initializations. The proposed method is computationally more efficient than standard Bayesian methods for large-scale sparse DNNs. The numerical results indicate that the proposed method can perform very well for large-scale network compression and high-dimensional nonlinear variable selection, both advancing interpretable machine learning.

Supplementary Materials

Supplement description: (i) proofs of Theorems 2.1–2.7 and Lemma 2.1; (ii) verification of the bounded gradient in Theorem 2.3; (iii) approximation of Bayesian evidence; and (iv) some mathematical facts of the sparse DNN.

Acknowledgments

The authors thank the editor, associate editors and two referees for their constructive comments which have led to significant improvement of this article.

Additional information

Funding

Liang’s research was supported in part by the grants DMS-2015498, NIH R01-GM117597 and NIH R01-GM126089. Song’s research was supported in part by the grant DMS-1811812.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.