Abstract
When the data are stored in a distributed manner, direct applications of traditional statistical inference procedures are often prohibitive due to communication costs and privacy concerns. This article develops and investigates two communication-efficient accurate statistical estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicate with the central processor, which then broadcasts aggregated information to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is presented explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multistep statistical estimator, we show that statistical efficiency can be achieved in finite steps in typical statistical applications. In addition, we give the conditions under which the one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.
Supplementary Material
Supplementary material: The file “supplementary.pdf” contains more details and proofs of the results in this article.
Acknowledgments
We gratefully acknowledge NSF grants DMS-1662139, DMS-1712591, DMS-2053832, DMS-2052926, NIH grant 2R01-GM072611-15, and ONR grant N00014-19-1-2120. We acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20-RR030893-01, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded April 15, 2010.
Notes
1 According to Nocedal and Wright (Citation2006), a sequence in
is said to converge Q-linearly to
if there exists
such that
for n sufficiently large.