210
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Machine learning applications in household-level demand prediction

ORCID Icon, , &
Pages 5-11 | Published online: 07 Sep 2022
 

ABSTRACT

Machine learning (ML) is becoming one of the most anticipated methods in predicting consumer demand. However, it is still uncertain how ML methods perform relative to traditional econometric methods under different dataset scales. This study estimates and compares the out-of-sample predictive accuracy of household budget share for organic fresh produce using two parametric models and six ML methods under regular and large sample sizes. Results show that ML method, particularly Logistic Elastic Net, performs better than econometric models under regular sample size. Contrarily, when dealing with big data, econometric models reach to same accuracy level as ML methods whereas random forest presents a possible overfitting problem. This study illustrates the competence of ML methods in demand prediction, but choosing the optimal method needs to consider product specifics, sample sizes, and observable features.

JEL CLASSIFICATION:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Nielsen disclaimer

Researcher(s) own analyses calculated (or derived) based in part from Nielsen Consumer LLC and marketing databases provided through NielsenIQ Datasets at the Kilts Center for Marketing Data Center at The University of Chicago Booth School of Business. The conclusions drawn from the NielsenIQ data are those of the researcher(s) and do not reflect the views of NielsenIQ. NielsenIQ is not responsible for, had no role in, and was not involved in analysing and preparing the results reported herein.

Notes

1 Varian (Citation2014) provided an overview of popular ML methods. For empirical applications, Athey (Citation2019) summarized representative uses of ML in economic literature from solving causal inference problems in predicting policy effects.

2 The predictive accuracy rankings are identical in validation set and test set. For the validation set, Unlike Bajari et al. (Citation2015), we do not calculate the percentages of weight based on the coefficients of the regression model that combines all predicted estimates by different methods, because the coefficient-based weight would be subject to high collinearity problem that leads to overweighting methods with small deviance while underweighting methods with large deviance..

3 yi1yi takes value of any real number. Then we are able to apply different methods on such an outcome.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 205.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.