Abstract
Distributed estimation based on different sources of observations has drawn attention in the modern statistical learning. When the distributed data are missing at random, we propose a two-stage -penalized communication-efficient surrogate likelihood (CSL) algorithm based on inverse probability weighting to eliminate the estimation bias caused by the missing data and construct sparse distributed M-estimator simultaneously. In the first stage, we consider a parametric propensity model and directly apply the -penalized CSL method to obtain an efficient and sparse distributed estimator of the propensity parameter. In the second stage, we construct an IPW-based -penalized CSL loss function to eliminate the bias and obtain the sparse M-estimation. The finite-sample performance of the estimators is studied through simulation, and an application to house sale prices data set is also presented.
Acknowledgments
The authors would like to thank Editor, Associate Editor, and two anonymous referees for helpful comments and suggestions.
Disclosure statement
No potential conflict of interest was reported by the author(s).