1,386
Views
19
CrossRef citations to date
0
Altmetric
Research Article

Pre- and within-season attendance forecasting in Major League Baseball: a random forest approach

Pages 4512-4528 | Published online: 10 Mar 2020
 

ABSTRACT

This study explores the forecasting of Major League Baseball game ticket sales and identifies important attendance predictors by means of random forests that are grown from classification and regression trees (CART) and conditional inference trees. Unlike previous studies that predict sports demand, I consider different forecasting horizons and only use information that is publicly accessible in advance of a game or season. The models are trained using data from 2013 to 2014 to make predictions for the 2015 regular season. The static within-season approach is complemented by a dynamic month-ahead forecasting strategy. Out-of-sample performance is evaluated for individual teams and tested against different least-squares dummy variable regression models and a naïve lagged attendance forecast. My empirical results show high variation in team-specific prediction accuracy with respect to both models and forecasting horizons. Linear and tree-ensemble models, on average, do not vary substantially in predictive accuracy; however, least-squares regression fails to account for various team-specific peculiarities, despite accounting for team fixed effects and censoring attendance predictions to fit to stadium capacities.

JEL CLASSIFICATION:

Acknowledgements

I thank seminar and conference participants in Munich (Econometrics in the Castle: Machine Learning in Economics and Econometrics), Hamburg (University), and Kiel (IfW, University) for helpful comments and suggestions and, in particular, Martin Spindler, Wolfgang Maennig, and an anonymous referee.

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Disclosure statement

No potential conflict of interest was reported by the author.

Supplementary material

Supplemental data for this article can be accessed here.

Notes

1 It is common practice in the sports demand literature to use attendance and ticket sales as proxies for sports demand (Jeffery Borland & Macdonald, 2003). Furthermore, the officially reported attendance figures are the total number of sold and free tickets per game, not the number of fans that were present at a game. In this paper, the terms sports demand, ticket sales, and attendance are used interchangeably, unless explicitly stated otherwise. For an emerging literature on spectator no-show behaviour that addresses differences in fans’ attendance and ticket purchase behaviours, see, e.g., Schreyer (Citation2019).

3 Double-header events usually stem from rescheduled games: in my data set, 105 out of 166 double-header games are rescheduled games.

4 The cleaned data sample includes 7011 games from the 2013, 2014, and 2015 regular MLB seasons;351 (5%) of the games show ticket sales that exceed stadium capacity, and 19 games have ticket sales that equal stadium capacity. In comparison, Denaux, Denaux, and Yalcin (Citation2011) analyse factors affecting MLB attendance using 22,940 game observations from 1979 to 2004 and find that 4.2% of the games show attendance above stadium capacity. Similarly, Lemke, Leonard, and Tlhokwane (Citation2010) predict MLB ticket sales for the 2007 season and find 7.5% of their analysed sample of games to be sold out (i.e., attendance at capacity or exceeding capacity), while Meehan, Nelson, and Richardson (Citation2007) analyse MLB attendance data from 2002 to 2003 and report that 4.8% of the games are sold out.

5 MCC is a balanced score measure that takes into account all four dimensions of the confusion matrix and, in contrast to the F1 score, is invariant to the definition of positive and negative classes (Chicco and Jurman Citation2020; Matthews Citation1975).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 387.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.