abstract
New technological advancements combined with powerful computer hardware and high-speed network make big data available. The massive sample size of big data introduces unique computational challenges on scalability and storage of statistical methods. In this paper, we focus on the lack of fit test of parametric regression models under the framework of big data. We develop a computationally feasible testing approach via integrating the divide-and-conquer algorithm into a powerful nonparametric test statistic. Our theory results show that under mild conditions, the asymptotic null distribution of the proposed test is standard normal. Furthermore, the proposed test benefits from the use of data-driven bandwidth procedure and thus possesses certain adaptive property. Simulation studies show that the proposed method has satisfactory performances, and it is illustrated with an analysis of an airline data.
Acknowledgments
The authors are grateful to the editor and two anonymous referees for their comments that have greatly improved this paper.
Disclosure statement
No potential conflict of interest was reported by the authors.
Additional information
Funding
Notes on contributors
Yanyan Zhao
Yanyan Zhao is a Ph.D. candidate at the Institute of Statistics, Nankai University, Tianjin, China.
Changliang Zou
Changliang Zou is a professor at the Institute of Statistics, Nankai University, Tianjin, China.
Zhaojun Wang
Zhaojun Wang is the corresponding author and professor at the Institute of Statistics, Nankai University, Tianjin, China.