345
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Online updating of information based model selection in the big data setting

ORCID Icon & ORCID Icon
Pages 3516-3529 | Received 25 Nov 2018, Accepted 24 May 2019, Published online: 10 Jun 2019
 

Abstract

The generalized information criterion (GIC) is an important tool for model selection in statistical inference. In the big data setting, traditional GIC cannot be calculated when the data size exceeds the computer memory. We propose an online updating approach to calculate the GIC, and perform model selection for huge datasets. Specifically, we define the online updating versions of GICs for streaming data for the normal linear regression and generalized linear models. Under reasonable regularity conditions, we show that the information criterion selection procedures are asymptotically valid. The performance of the proposed criteria is assessed using extensive simulation study. The usage of our proposed model selection procedure is further illustrated with the analysis of two large datasets, the covertype data and the earthquake data. For both datasets, the online updating procedure selected the same or similar model as the entire data based model selection procedure.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,090.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.