Abstract
The generalized information criterion (GIC) is an important tool for model selection in statistical inference. In the big data setting, traditional GIC cannot be calculated when the data size exceeds the computer memory. We propose an online updating approach to calculate the GIC, and perform model selection for huge datasets. Specifically, we define the online updating versions of GICs for streaming data for the normal linear regression and generalized linear models. Under reasonable regularity conditions, we show that the information criterion selection procedures are asymptotically valid. The performance of the proposed criteria is assessed using extensive simulation study. The usage of our proposed model selection procedure is further illustrated with the analysis of two large datasets, the covertype data and the earthquake data. For both datasets, the online updating procedure selected the same or similar model as the entire data based model selection procedure.