Abstract
The rapid emergence of massive datasets in various fields poses a serious challenge to traditional statistical methods. Meanwhile, it provides opportunities for researchers to develop novel algorithms. Inspired by the idea of divide-and-conquer, various distributed frameworks for statistical estimation and inference have been proposed. They were developed to deal with large-scale statistical optimization problems. This paper aims to provide a comprehensive review for related literature. It includes parametric models, nonparametric models, and other frequently used models. Their key ideas and theoretical properties are summarized. The trade-off between communication cost and estimate precision together with other concerns is discussed.
Disclosure statement
No potential conflict of interest was reported by the authors.
Additional information
Funding
Notes on contributors
Yuan Gao
Mr. Yuan Gao is a Ph.D. candidate in school of statistics at East China Normal University.
Weidong Liu
Dr. Weidong Liu is the Distinguished Professor in school of mathematical sciences at Shanghai Jiao Tong University.
Hansheng Wang
Dr. Hansheng Wang is a professor in Guanghua School of Management at Peking University.
Xiaozhou Wang
Dr. Xiaozhou Wang is an assistant professor in school of statistics at East China Normal University.
Yibo Yan
Mr. Yibo Yan is a Ph.D. candidate in school of statistics at East China Normal University.
Riquan Zhang
Dr. Riquan Zhang is a professor in school of statistics at East China Normal University.