ABSTRACT
In this note, we contrast prediction performance of nine econometric and machine learning methods, including a new hybrid method combining model averaging and machine learning, using data from the film industry and social media. The results suggest that machine learning methods have an advantage in addressing short-run noise, whereas traditional econometric methods are better at capturing long-run trend. In addition, once sample heterogeneity is controlled, the new hybrid method tends to strike a right balance in dealing with both noise and trend, leading to superior prediction efficiency.
Acknowledgments
We wish to thank seminar participants at the 2017 Young Econometricians around the Pacific (YEAP) conference, Chinese Academy of Sciences, Renmin University, and Xiamen University for comments and suggestions. Any errors are our own.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1 Box office prediction is an important research topic and poses unique challenges for achieving high prediction accuracy. See Liu (Citation2006), Chintagunta, Gopinath, and Venkataraman (Citation2010) and Moretti (Citation2011) for recent references.
2 There are 12 genres in total and one movie can have three genres at most.
3 We follow the machine learning literature by applying the 10-fold cross validation criterion to determine the level of splitting in regression tree. The same criterion is applied in the bootstrap aggregation method later.
4 The sample sizes are 40, 32 and 35, respectively, and they are the three genres with sample size larger than 30. To save space, we omit results on GUM, MTV, GETS and AIC. None of the methods gives the highest prediction accuracy.