Abstract
Many branches of contemporary science are generating large amounts of data. Due to the limitation of calculation time and cost, traditional statistical methods are no longer applicable to large data sets. For a very large data set containing N points, an effective method is to extract n () points for research, so that the subsampled n points represent the full sample as much as possible, and the information contained in the subdata will not be lost a lot. It is necessary to design an algorithm for selecting sample points. Orthogonal subsampling for big data based on two-level orthogonal array is a popular approach. Based on the projection properties of orthogonal array, this paper defines a new discrepancy function to evaluate the quality of the selected subdata and proposes three algorithms to select subdata according to different situations. Simulation studies show that the new algorithms have higher A-efficiency and D-efficiency and perform well in minimizing the mean squared errors of the estimated parameters.
Keywords:
Acknowledgments
The authors would like to thank the Associate Editor and the referee for the constructive suggestions that lead to a significant improvement over the article.