2,125
Views
8
CrossRef citations to date
0
Altmetric
Theory and Methods

First-Order Newton-Type Estimator for Distributed Estimation and Inference

, &
Pages 1858-1874 | Received 01 Nov 2019, Accepted 12 Feb 2021, Published online: 12 Apr 2021
 

Abstract

This article studies distributed estimation and inference for a general statistical problem with a convex loss that could be nondifferentiable. For the purpose of efficient computation, we restrict ourselves to stochastic first-order optimization, which enjoys low per-iteration complexity. To motivate the proposed method, we first investigate the theoretical properties of a straightforward divide-and-conquer stochastic gradient descent approach. Our theory shows that there is a restriction on the number of machines and this restriction becomes more stringent when the dimension p is large. To overcome this limitation, this article proposes a new multi-round distributed estimation procedure that approximates the Newton step only using stochastic subgradient. The key component in our method is the proposal of a computationally efficient estimator of Σ1w, where Σ is the population Hessian matrix and w is any given vector. Instead of estimating Σ (or Σ1) that usually requires the second-order differentiability of the loss, the proposed first-order Newton-type estimator (FONE) directly estimates the vector of interest Σ1w as a whole and is applicable to nondifferentiable losses. Our estimator also facilitates the inference for the empirical risk minimizer. It turns out that the key term in the limiting covariance has the form of Σ1w, which can be estimated by FONE.

Supplementary Materials

The supplementary material provides the verification of conditions, the theory of mini-batch SGD with diverging dimension, the proofs of all technical results in the main paper, and additional numerical experiments.

Acknowledgments

The authors are very grateful to anonymous referees and the associate editor for their detailed and constructive comments that considerably improved the quality of this article.

Notes

1 With a slight abuse of notation, we use n to denote either the sample size in nondistributed settings or the local sample size of a single machine in distributed settings.

2 Noting that although we present the evenly distributed setting for DC-SGD for the ease of illustration, one can easily see the convergence rate is actually determined by the smallest subsample size from the proof.

Additional information

Funding

Weidong Liu is supported by National Program on Key Basic Research Project (973 Program, 2018AAA0100704), NSFC grant nos. 11825104 and 11690013, Youth Talent Support Program, and a grant from Australian Research Council.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.